Methods for scarless introduction of targeted modifications into targeting vectors

ABSTRACT

Methods for introducing a scarless targeted genetic modification into a preexisting targeting vector are provided. The methods can use combinations of bacterial homologous recombination (BHR) and in vitro assembly to introduce such targeted genetic modifications into a preexisting targeting vector in a scarless manner.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Application No. 62/829,327,filed Apr. 4, 2019, which is herein incorporated by reference in itsentirety for all purposes.

REFERENCE TO A SEQUENCE LISTING SUBMITTED AS A TEXT FILE VIA EFS WEB

The Sequence Listing written in file 545017SEQLIST.txt is 20.7kilobytes, was created on Apr. 2, 2020, and is hereby incorporated byreference.

BACKGROUND

Seamless DNA construction is of particular importance when creatingtransgenic animal lines, as the scars produced by restriction sites orother manipulations can negatively impact gene expression if they landin a region important for regulation. Targeting the mammalian genomeoften requires construction of large targeting vectors with long DNAarms to direct homologous recombination, as well as antibioticresistance cassettes for selection of embryonic stem cell clones.Correctly targeted clones often contain multiple scars necessary forconstruction of the vector, not to mention the resistance cassetteitself. For genetic ablation, these lesions may not matter for the endresult (a null allele), but there is always a chance that expression byneighboring genes will be adversely affected. For modifications otherthan knock-out, such as knock-in, faithful expression of the targetedlocus is usually important for the studies in question.

In particular, humanization, the direct replacement of a mouse gene withits human counterpart, requires seamless junctions between mouse andhuman sequence so that mouse transcription machinery will faithfullyreplicate expression of the new allele. Care must be taken to buryconstruction scars and selection cassette in non-coding regions that donot impact gene regulation. As animal models become more complex, moremodifications may be added on top of existing ones, such as humandisease-causing mutations on top of humanized alleles. The additionalchanges can then add even more scars and another selection cassette toan already highly engineered mouse locus, increasing the likelihood thatexpression will be altered and the mouse model will not be faithful tohuman disease. From a construction standpoint, adding a new cassette toa vector already containing one can become complicated due to undesiredrecombination between shared cassette elements such as promoters andpoly(A) signals, even if the two cassettes encode different selections.Consequently, new methods are needed to simplify the generation oftargeting carrying multiple changes (such as a humanized allele and adisease mutation layered on top) and to minimize the scars incorporatedinto a final animal model.

SUMMARY

Methods of scarless introduction of a targeted genetic modification intoa preexisting targeting vector are provided.

In one aspect, some such methods comprise: (a) performing bacterialhomologous recombination between the preexisting targeting vector and amodification cassette in a population of bacterial cells, wherein themodification cassette comprises the targeted genetic modification andcomprises an insert nucleic acid flanked by a 5′ homology armcorresponding to a 5′ target sequence in the preexisting targetingvector and a 3′ homology arm corresponding to a 3′ target sequence inthe preexisting vector, wherein the insert nucleic acid comprises from5′ to 3′: (i) a first repeat sequence; (ii) a first target site for afirst nuclease agent; (iii) a selection cassette; (iv) a second targetsite for a second nuclease agent; and (v) a second repeat sequenceidentical to the first repeat sequence; (b) selecting bacterial cellscomprising a modified targeting vector comprising the selectioncassette; (c) cleaving the first target site in the modified targetingvector with the first nuclease agent and cleaving the second target sitein the modified targeting vector with the second nuclease agent toremove the selection cassette and expose the first repeat sequence andthe second repeat sequence in the modified targeting vector; and (d)assembling the exposed first repeat sequence with the exposed secondrepeat sequence in an intramolecular in vitro assembly reaction togenerate the targeting vector comprising the scarless targeted geneticmodification, wherein neither the first target site for the firstnuclease agent nor the second target site for the second nuclease agentare present and only a single copy of the repeat sequence is present inthe targeting vector comprising the scarless targeted geneticmodification.

In some such methods, the repeat sequence is identical to a sequence inthe preexisting targeting vector. In some such methods, the targetedgenetic modification comprises an insertion, and the repeat sequence isidentical to the 5′ end or the 3′ end of the insertion.

In some such methods, the repeat sequence is at least about 20nucleotides in length. Optionally, the repeat sequence is between about20 nucleotides and about 100 nucleotides in length.

In some such methods, the modification cassette is a linear,double-stranded nucleic acid. In some such methods, the modificationcassette is from about 1 kb to about 15 kb in length. In some suchmethods, the 5′ homology arm and the 3′ homology arm are each at leastabout 35 nucleotides in length. In some such methods, the 5′ homologyarm and the 3′ homology arm are each between about 35 nucleotides andabout 500 nucleotides in length.

In some such methods, the first nuclease agent and/or the secondnuclease agent is a rare-cutting nuclease agent. In some such methods,the first target site and/or the second target site is not present inthe preexisting targeting vector. In some such methods, the first targetsite is identical to the second target site, and the first nucleaseagent is identical to the second nuclease agent.

In some such methods, the first nuclease agent and/or the secondnuclease agent comprises a rare-cutting restriction enzyme. Optionally,the rare-cutting restriction enzyme is NotI, XmaIII, SstII, Sall, NruI,NheI, Nb.BbvCI, BbvCI, AscI, AsiSI, FseI, PacI, PmeI, SbfI, SgrAI, SwaI,BspQI, SapI, SfiI, CspCI, AbsI, CciNI, FspAI, MauBI, MreI, MssI, PalAI,RgaI, RigI, SdaI, SfaAI, SgfI, SgrDI, SgsI, SmiI, SrfI, Sse2321,Sse83871, LguI, PciSI, AarI, AjuI, AloI, BarI, PpiI, or PsrI.

In some such methods, the first nuclease agent and/or the secondnuclease agent is a Clustered Regularly Interspaced Short PalindromicRepeats (CRISPR)-associated (Cas) protein and a guide RNA (gRNA), a zincfinger nuclease (ZFN), a Transcription Activator-Like Effector Nuclease(TALEN), or an engineered meganuclease. Optionally, the first nucleaseagent and/or the second nuclease agent the Cas protein and the gRNA,wherein the Cas protein is Cas9, and wherein the gRNA comprises a CRISPRRNA (crRNA) that targets and a trans-activating CRIPSR RNA (tracrRNA).

In some such methods, the targeted genetic modification comprises amodification in the 5′ homology arm or the 3′ homology arm. In some suchmethods, the targeted genetic modification comprises a modification inthe insert nucleic acid. In some such methods, the targeted geneticmodification comprises a point mutation, a deletion, an insertion, areplacement, or a combination thereof.

In some such methods, the selection cassette imparts resistance to anantibiotic. Optionally, the selection cassette imparts resistance toampicillin, chloramphenicol, tetracycline, kanamycin, spectinomycin,streptomycin, carbenicillin, bleomycin, erythromycin, or polymyxin B.

In some such methods, the preexisting targeting vector is a largetargeting vector at least about 10 kb in length. Optionally, thepreexisting targeting vector is at least about 100 kb in length.

In some such methods, the preexisting targeting vector comprises asecond selection cassette. Optionally, the second selection cassetteimparts resistance to an antibiotic. Optionally, the selection cassettein the modification cassette and the second selection cassette in thepreexisting targeting vector each imparts resistance to a differentantibiotic. Optionally, the second selection cassette allows forselection in both bacterial and mammalian cells.

In some such methods, step (c) occurs in vitro.

In some such methods, step (d) comprises: (i) contacting the modifiedtargeting vector with an exonuclease to expose complementary sequencesbetween the first repeat sequence and the second repeat sequence; (ii)annealing the exposed complementary sequences; (iii) extending the 3′ends of the annealed complementary sequences; and (iv) ligating theannealed complementary sequence. Optionally, step (d) comprisesincubating the modified targeting vector with an exonuclease, a DNApolymerase, and a DNA ligase.

Some such methods further comprise: (e) treating the targeting vectorwith the first nuclease agent and the second nuclease agent followingthe in vitro assembly in step (d) to verify that neither the firsttarget site for the first nuclease agent nor the second target site forthe second nuclease agent are present.

In another aspect, some such methods comprise: (a) performing bacterialhomologous recombination between the preexisting targeting vector and adeletion cassette in a population of bacterial cells, wherein thedeletion cassette comprises an insert nucleic acid flanked by a 5′homology arm corresponding to a 5′ target sequence in the preexistingtargeting vector and a 3′ homology arm corresponding to a 3′ targetsequence in the preexisting vector, wherein the 5′ target sequence andthe 3′ target sequence flank a region of the preexisting targetingvector into which the targeted genetic modification is to be introduced,and wherein the insert nucleic acid comprises from 5′ to 3′: (i) a firsttarget site for a first nuclease agent; (ii) a selection cassette; and(iii) a second target site for a second nuclease agent; (b) selectingbacterial cells comprising a modified targeting vector comprising theselection cassette; (c) cleaving the first target site in the modifiedtargeting vector with the first nuclease agent and cleaving the secondtarget site in the modified targeting vector with the second nucleaseagent to remove the selection cassette and expose an upstream endsequence and a downstream end sequence in the modified targeting vector;and (d) assembling in an in vitro assembly reaction the cleavedtargeting vector with a modification cassette comprising the targetedgenetic modification flanked by an upstream end sequence overlapping theupstream end sequence in the modified targeting vector and a downstreamend sequence overlapping the downstream end sequence in the modifiedtargeting vector to generate the targeting vector comprising thescarless targeted genetic modification, wherein neither the first targetsite for the first nuclease agent nor the second target site for thesecond nuclease agent are present in the targeting vector comprising thescarless targeted genetic modification.

In some such methods, the deletion cassette is from about 1 kb to about15 kb in length. In some such methods, the 5′ homology arm and the 3′homology arm are each at least about 35 nucleotides in length.Optionally, the 5′ homology arm and the 3′ homology arm are each betweenabout 35 nucleotides and about 500 nucleotides in length. In some suchmethods, the deletion cassette is a linear, double-stranded nucleicacid.

In some such methods, the first nuclease agent and/or the secondnuclease agent is a rare-cutting nuclease agent. In some such methods,the first target site and/or the second target site is not present inthe preexisting targeting vector. In some such methods, the first targetsite is identical to the second target site, and the first nucleaseagent is identical to the second nuclease agent.

In some such methods, the first nuclease agent and/or the secondnuclease agent comprises a rare-cutting restriction enzyme. Optionally,the rare-cutting restriction enzyme is NotI, XmaIII, SstII, Sall, NruI,NheI, Nb.BbvCI, BbvCI, AscI, AsiSI, FseI, PacI, PmeI, SbfI, SgrAI, SwaI,BspQI, SapI, SfiI, CspCI, AbsI, CciNI, FspAI, MauBI, MreI, MssI, PalAI,RgaI, RigI, SdaI, SfaAI, SgfI, SgrDI, SgsI, SmiI, SrfI, Sse2321,Sse83871, LguI, PciSI, AarI, AjuI, AloI, BarI, PpiI, or PsrI.

In some such methods, the first nuclease agent and/or the secondnuclease agent is a Clustered Regularly Interspaced Short PalindromicRepeats (CRISPR)-associated (Cas) protein and a guide RNA (gRNA), a zincfinger nuclease (ZFN), a Transcription Activator-Like Effector Nuclease(TALEN), or an engineered meganuclease. Optionally, the first nucleaseagent and/or the second nuclease agent the Cas protein and the gRNA,wherein the Cas protein is Cas9, and wherein the gRNA comprises a CRISPRRNA (crRNA) that targets and a trans-activating CRIPSR RNA (tracrRNA).

In some such methods, the selection cassette imparts resistance to anantibiotic. Optionally, the selection cassette imparts resistance toampicillin, chloramphenicol, tetracycline, kanamycin, spectinomycin,streptomycin, carbenicillin, bleomycin, erythromycin, or polymyxin B.

In some such methods, the preexisting targeting vector is a largetargeting vector at least 10 kb in length. Optionally, the preexistingtargeting vector is at least 100 kb in length.

In some such methods, the preexisting targeting vector comprises asecond selection cassette. Optionally, the second selection cassetteimparts resistance to an antibiotic. Optionally, the selection cassettein the deletion cassette and the second selection cassette in thepreexisting targeting vector each imparts resistance to a differentantibiotic. Optionally, the second selection cassette allows forselection in both bacterial and mammalian cells.

In some such methods, the length of the overlap between the upstream endsequence in the modification cassette and the upstream end sequence inthe modified targeting vector and/or the length of the overlap betweenthe downstream end sequence in the modification cassette and thedownstream end sequence in the modified targeting vector is at leastabout 20 nucleotides in length. In some such methods, the length of theoverlap between the upstream end sequence in the modification cassetteand the upstream end sequence in the modified targeting vector and/orthe length of the overlap between the downstream end sequence in themodification cassette and the downstream end sequence in the modifiedtargeting vector is between about 20 and about 100 nucleotides inlength.

In some such methods, wherein step (c) occurs in vitro.

In some such methods, step (d) comprises: (i) contacting the cleavedtargeting vector and the modification cassette with an exonuclease toexpose complementary sequences between the end sequences in the modifiedtargeting vector and the end sequences in the modification cassette;(ii) annealing the exposed complementary sequences; (iii) extending the3′ ends of the annealed complementary sequences; and (iv) ligating theannealed complementary sequence. Optionally, step (d) comprisesincubating the cleaved targeting vector and the modification cassettewith an exonuclease, a DNA polymerase, and a DNA ligase.

In some such methods, the modification cassette is a linear,double-stranded nucleic acid. In some such methods, the modificationcassette is at least about 200 nucleotides in length. In some suchmethods, the modification cassette modification cassette is a size thatcannot be directly synthesized or generated by polymerase chainreaction. In some such methods, the modification cassette is at leastabout 10 kb in length.

In some such methods, the targeted genetic modification comprises apoint mutation, a deletion, an insertion, a replacement, or acombination thereof.

Some such methods further comprise: (e) treating the targeting vectorwith the first nuclease agent and the second nuclease agent followingthe in vitro assembly in step (d) to verify that neither the firsttarget site for the first nuclease agent nor the second target site forthe second nuclease agent are present.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 (not to scale) shows a schematic of a method for scarlessintroduction of a point mutation into a large targeting vector viabacterial homologous recombination and intramolecular Gibson assembly.

FIG. 2 (not to scale) shows a schematic of a synthesized nucleic acid tobe used in the method shown in the schematic in FIG. 1.

FIG. 3 (not to scale) shows a schematic of a method for scarlessintroduction of a point mutation into a large targeting vector viabacterial homologous recombination and intermolecular Gibson assembly.

FIGS. 4A-4B show a traditional targeting strategy using modified mouseBACs as vectors and self-deleting cassette technology, from vectorconstruction (FIG. 4A) through F1 mouse generation (FIG. 4B). Deletionof cassette via mouse protamine-expressed Cre recombinase leaves a 78 bpscar containing a single loxP.

DEFINITIONS

The terms “protein,” “polypeptide,” and “peptide,” used interchangeablyherein, include polymeric forms of amino acids of any length, includingcoded and non-coded amino acids and chemically or biochemically modifiedor derivatized amino acids. The terms also include polymers that havebeen modified, such as polypeptides having modified peptide backbones.The term “domain” refers to any part of a protein or polypeptide havinga particular function or structure.

The terms “nucleic acid” and “polynucleotide,” used interchangeablyherein, include polymeric forms of nucleotides of any length, includingribonucleotides, deoxyribonucleotides, or analogs or modified versionsthereof. They include single-, double-, and multi-stranded DNA or RNA,genomic DNA, cDNA, DNA-RNA hybrids, and polymers comprising purinebases, pyrimidine bases, or other natural, chemically modified,biochemically modified, non-natural, or derivatized nucleotide bases.

The term “targeting vector” refers to a recombinant nucleic acid thatcan be introduced by homologous recombination,non-homologous-end-joining-mediated ligation, or any other means ofrecombination to a target position in the genome of a cell.

The term “wild type” includes entities having a structure and/oractivity as found in a normal (as contrasted with mutant, diseased,altered, or so forth) state or context. Wild type genes and polypeptidesoften exist in multiple different forms (e.g., alleles).

The term “endogenous sequence” refers to a nucleic acid sequence thatoccurs naturally within a cell or non-human animal. For example, anendogenous Rosa26 sequence of a non-human animal refers to a nativeRosa26 sequence that naturally occurs at the Rosa26 locus in thenon-human animal.

“Exogenous” molecules or sequences include molecules or sequences thatare not normally present in a cell in that form or location (e.g.,genomic locus). Normal presence includes presence with respect to theparticular developmental stage and environmental conditions of the cell.An exogenous molecule or sequence, for example, can include a mutatedversion of a corresponding endogenous sequence within the cell, such asa humanized version of the endogenous sequence, or can include asequence corresponding to an endogenous sequence within the cell but ina different form (i.e., not within a chromosome). In contrast,endogenous molecules or sequences include molecules or sequences thatare normally present in that form and location in a particular cell at aparticular developmental stage under particular environmentalconditions.

The term “heterologous” when used in the context of a nucleic acid or aprotein indicates that the nucleic acid or protein comprises at leasttwo segments that do not naturally occur together in the same molecule.For example, the term “heterologous,” when used with reference tosegments of a nucleic acid or segments of a protein, indicates that thenucleic acid or protein comprises two or more sub-sequences that are notfound in the same relationship to each other (e.g., joined together) innature. As one example, a “heterologous” region of a nucleic acid vectoris a segment of nucleic acid within or attached to another nucleic acidmolecule that is not found in association with the other molecule innature. For example, a heterologous region of a nucleic acid vectorcould include a coding sequence flanked by sequences not found inassociation with the coding sequence in nature. Likewise, a“heterologous” region of a protein is a segment of amino acids within orattached to another peptide molecule that is not found in associationwith the other peptide molecule in nature (e.g., a fusion protein, or aprotein with a tag). Similarly, a nucleic acid or protein can comprise aheterologous label or a heterologous secretion or localization sequence.

“Codon optimization” takes advantage of the degeneracy of codons, asexhibited by the multiplicity of three-base pair codon combinations thatspecify an amino acid, and generally includes a process of modifying anucleic acid sequence for enhanced expression in particular host cellsby replacing at least one codon of the native sequence with a codon thatis more frequently or most frequently used in the genes of the host cellwhile maintaining the native amino acid sequence. For example, a nucleicacid encoding a Cas9 protein can be modified to substitute codons havinga higher frequency of usage in a given prokaryotic or eukaryotic cell,including a bacterial cell, a yeast cell, a human cell, a non-humancell, a mammalian cell, a rodent cell, a mouse cell, a rat cell, ahamster cell, or any other host cell, as compared to the naturallyoccurring nucleic acid sequence. Codon usage tables are readilyavailable, for example, at the “Codon Usage Database.” These tables canbe adapted in a number of ways. See Nakamura et al. (2000) Nucleic AcidsRes. 28:292, herein incorporated by reference in its entirety for allpurposes. Computer algorithms for codon optimization of a particularsequence for expression in a particular host are also available (see,e.g., Gene Forge).

The term “locus” refers to a specific location of a gene (or significantsequence), DNA sequence, polypeptide-encoding sequence, or position on achromosome of the genome of an organism. For example, a Rosa26 locus mayrefer to the specific location of a Rosa26 gene, Rosa26 DNA sequence, orRosa26 position on a chromosome of the genome of an organism that hasbeen identified as to where such a sequence resides. A “Rosa26 locus”may comprise a regulatory element of a Rosa26 gene, including, forexample, an enhancer, a promoter, 5′ and/or 3′ untranslated region(UTR), or a combination thereof.

The term “gene” refers to a DNA sequence in a chromosome that codes fora product (e.g., an RNA product and/or a polypeptide product) andincludes the coding region interrupted with non-coding introns andsequence located adjacent to the coding region on both the 5′ and 3′ends such that the gene corresponds to the full-length mRNA (includingthe 5′ and 3′ untranslated sequences). The term “gene” also includesother non-coding sequences including regulatory sequences (e.g.,promoters, enhancers, and transcription factor binding sites),polyadenylation signals, internal ribosome entry sites, silencers,insulating sequence, and matrix attachment regions. These sequences maybe close to the coding region of the gene (e.g., within 10 kb) or atdistant sites, and they influence the level or rate of transcription andtranslation of the gene.

A “promoter” is a regulatory region of DNA usually comprising a TATA boxcapable of directing RNA polymerase II to initiate RNA synthesis at theappropriate transcription initiation site for a particularpolynucleotide sequence. A promoter may additionally comprise otherregions which influence the transcription initiation rate. The promotersequences disclosed herein modulate transcription of an operably linkedpolynucleotide. A promoter can be active in one or more of the celltypes disclosed herein (e.g., a prokaryotic cell or a eukaryotic cell(such as a mammalian cell), or a combination thereof). A promoter canbe, for example, a constitutively active promoter, a conditionalpromoter, an inducible promoter, a temporally restricted promoter (e.g.,a developmentally regulated promoter), or a spatially restrictedpromoter (e.g., a cell-specific or tissue-specific promoter). Examplesof promoters can be found, for example, in WO 2013/176772, hereinincorporated by reference in its entirety for all purposes.

“Operable linkage” or being “operably linked” includes juxtaposition oftwo or more components (e.g., a promoter and another sequence element)such that both components function normally and allow the possibilitythat at least one of the components can mediate a function that isexerted upon at least one of the other components. For example, apromoter can be operably linked to a coding sequence if the promotercontrols the level of transcription of the coding sequence in responseto the presence or absence of one or more transcriptional regulatoryfactors. Operable linkage can include such sequences being contiguouswith each other or acting in trans (e.g., a regulatory sequence can actat a distance to control transcription of the coding sequence).

“Complementarity” of nucleic acids means that a nucleotide sequence inone strand of nucleic acid, due to orientation of its nucleobase groups,forms hydrogen bonds with another sequence on an opposing nucleic acidstrand. The complementary bases in DNA are typically A with T and C withG. In RNA, they are typically C with G and U with A. Complementarity canbe perfect or substantial/sufficient. Perfect complementarity betweentwo nucleic acids means that the two nucleic acids can form a duplex inwhich every base in the duplex is bonded to a complementary base byWatson-Crick pairing. “Substantial” or “sufficient” complementary meansthat a sequence in one strand is not completely and/or perfectlycomplementary to a sequence in an opposing strand, but that sufficientbonding occurs between bases on the two strands to form a stable hybridcomplex in set of hybridization conditions (e.g., salt concentration andtemperature). Such conditions can be predicted by using the sequencesand standard mathematical calculations to predict the Tm (meltingtemperature) of hybridized strands, or by empirical determination of Tmby using routine methods. Tm includes the temperature at which apopulation of hybridization complexes formed between two nucleic acidstrands are 50% denatured (i.e., a population of double-stranded nucleicacid molecules becomes half dissociated into single strands). At atemperature below the Tm, formation of a hybridization complex isfavored, whereas at a temperature above the Tm, melting or separation ofthe strands in the hybridization complex is favored. Tm may be estimatedfor a nucleic acid having a known G+C content in an aqueous 1M NaClsolution by using, e.g., Tm=81.5+0.41(% G+C), although other known Tmcomputations take into account nucleic acid structural characteristics.

Hybridization requires that the two nucleic acids contain complementarysequences, although mismatches between bases are possible. Theconditions appropriate for hybridization between two nucleic acidsdepend on the length of the nucleic acids and the degree ofcomplementation, variables which are well known. The greater the degreeof complementation between two nucleotide sequences, the greater thevalue of the melting temperature (Tm) for hybrids of nucleic acidshaving those sequences. For hybridizations between nucleic acids withshort stretches of complementarity (e.g. complementarity over 35 orfewer, 30 or fewer, 25 or fewer, 22 or fewer, 20 or fewer, or 18 orfewer nucleotides) the position of mismatches becomes important (seeSambrook et al., supra, 11.7-11.8). Typically, the length for ahybridizable nucleic acid is at least about 10 nucleotides. Illustrativeminimum lengths for a hybridizable nucleic acid include at least about15 nucleotides, at least about 20 nucleotides, at least about 22nucleotides, at least about 25 nucleotides, and at least about 30nucleotides. Furthermore, the temperature and wash solution saltconcentration may be adjusted as necessary according to factors such aslength of the region of complementation and the degree ofcomplementation.

The sequence of polynucleotide need not be 100% complementary to that ofits target nucleic acid to be specifically hybridizable. Moreover, apolynucleotide may hybridize over one or more segments such thatintervening or adjacent segments are not involved in the hybridizationevent (e.g., a loop structure or hairpin structure). A polynucleotide(e.g., gRNA) can comprise at least 70%, at least 80%, at least 90%, atleast 95%, at least 99%, or 100% sequence complementarity to a targetregion within the target nucleic acid sequence to which they aretargeted. For example, a gRNA in which 18 of 20 nucleotides arecomplementary to a target region, and would therefore specificallyhybridize, would represent 90% complementarity. In this example, theremaining noncomplementary nucleotides may be clustered or interspersedwith complementary nucleotides and need not be contiguous to each otheror to complementary nucleotides.

Percent complementarity between particular stretches of nucleic acidsequences within nucleic acids can be determined routinely using BLASTprograms (basic local alignment search tools) and PowerBLAST programs(Altschul et al. (1990) J. Mol. Biol. 215:403-410; Zhang and Madden(1997) Genome Res. 7:649-656, each of which is herein incorporated byreference in its entirety for all purposes) or by using the Gap program(Wisconsin Sequence Analysis Package, Version 8 for Unix, GeneticsComputer Group, University Research Park, Madison Wis.), using defaultsettings, which uses the algorithm of Smith and Waterman (1981) Adv.Appl. Math. 2:482-489, herein incorporated by reference in its entiretyfor all purposes.

The methods and compositions provided herein employ a variety ofdifferent components. Some components throughout the description canhave active variants and fragments. Such components include, forexample, Cas proteins, CRISPR RNAs, tracrRNAs, and guide RNAs.Biological activity for each of these components is described elsewhereherein. The term “functional” refers to the innate ability of a proteinor nucleic acid (or a fragment or variant thereof) to exhibit abiological activity or function. Such biological activities or functionscan include, for example, the ability of a Cas protein to bind to aguide RNA and to a target DNA sequence. The biological functions offunctional fragments or variants may be the same or may in fact bechanged (e.g., with respect to their specificity or selectivity orefficacy) in comparison to the original, but with retention of the basicbiological function.

The term “variant” refers to a nucleotide sequence differing from thesequence most prevalent in a population (e.g., by one nucleotide) or aprotein sequence different from the sequence most prevalent in apopulation (e.g., by one amino acid).

The term “fragment” when referring to a protein means a protein that isshorter or has fewer amino acids than the full-length protein. The term“fragment” when referring to a nucleic acid means a nucleic acid that isshorter or has fewer nucleotides than the full-length nucleic acid. Afragment can be, for example, an N-terminal fragment (i.e., removal of aportion of the C-terminal end of the protein), a C-terminal fragment(i.e., removal of a portion of the N-terminal end of the protein), or aninternal fragment.

“Sequence identity” or “identity” in the context of two polynucleotidesor polypeptide sequences makes reference to the residues in the twosequences that are the same when aligned for maximum correspondence overa specified comparison window. When percentage of sequence identity isused in reference to proteins, residue positions which are not identicaloften differ by conservative amino acid substitutions, where amino acidresidues are substituted for other amino acid residues with similarchemical properties (e.g., charge or hydrophobicity) and therefore donot change the functional properties of the molecule. When sequencesdiffer in conservative substitutions, the percent sequence identity maybe adjusted upwards to correct for the conservative nature of thesubstitution. Sequences that differ by such conservative substitutionsare said to have “sequence similarity” or “similarity.” Means for makingthis adjustment are well known. Typically, this involves scoring aconservative substitution as a partial rather than a full mismatch,thereby increasing the percentage sequence identity. Thus, for example,where an identical amino acid is given a score of 1 and anon-conservative substitution is given a score of zero, a conservativesubstitution is given a score between zero and 1. The scoring ofconservative substitutions is calculated, e.g., as implemented in theprogram PC/GENE (Intelligenetics, Mountain View, Calif.).

“Percentage of sequence identity” includes the value determined bycomparing two optimally aligned sequences (greatest number of perfectlymatched residues) over a comparison window, wherein the portion of thepolynucleotide sequence in the comparison window may comprise additionsor deletions (i.e., gaps) as compared to the reference sequence (whichdoes not comprise additions or deletions) for optimal alignment of thetwo sequences. The percentage is calculated by determining the number ofpositions at which the identical nucleic acid base or amino acid residueoccurs in both sequences to yield the number of matched positions,dividing the number of matched positions by the total number ofpositions in the window of comparison, and multiplying the result by 100to yield the percentage of sequence identity. Unless otherwise specified(e.g., the shorter sequence includes a linked heterologous sequence),the comparison window is the full length of the shorter of the twosequences being compared.

Unless otherwise stated, sequence identity/similarity values include thevalue obtained using GAP Version 10 using the following parameters: %identity and % similarity for a nucleotide sequence using GAP Weight of50 and Length Weight of 3, and the nwsgapdna.cmp scoring matrix; %identity and % similarity for an amino acid sequence using GAP Weight of8 and Length Weight of 2, and the BLOSUM62 scoring matrix; or anyequivalent program thereof “Equivalent program” includes any sequencecomparison program that, for any two sequences in question, generates analignment having identical nucleotide or amino acid residue matches andan identical percent sequence identity when compared to thecorresponding alignment generated by GAP Version 10.

The term “conservative amino acid substitution” refers to thesubstitution of an amino acid that is normally present in the sequencewith a different amino acid of similar size, charge, or polarity.Examples of conservative substitutions include the substitution of anon-polar (hydrophobic) residue such as isoleucine, valine, or leucinefor another non-polar residue. Likewise, examples of conservativesubstitutions include the substitution of one polar (hydrophilic)residue for another such as between arginine and lysine, betweenglutamine and asparagine, or between glycine and serine. Additionally,the substitution of a basic residue such as lysine, arginine, orhistidine for another, or the substitution of one acidic residue such asaspartic acid or glutamic acid for another acidic residue are additionalexamples of conservative substitutions. Examples of non-conservativesubstitutions include the substitution of a non-polar (hydrophobic)amino acid residue such as isoleucine, valine, leucine, alanine, ormethionine for a polar (hydrophilic) residue such as cysteine,glutamine, glutamic acid or lysine and/or a polar residue for anon-polar residue. Typical amino acid categorizations are summarized inTable 1 below.

TABLE 1 Amino Acid Categorizations. Alanine Ala A Nonpolar Neutral 1.8Arginine Arg R Polar Positive −4.5 Asparagine Asn N Polar Neutral −3.5Aspartic acid Asp D Polar Negative −3.5 Cysteine Cys C Nonpolar Neutral2.5 Glutamic acid Glu E Polar Negative −3.5 Glutamine Gln Q PolarNeutral −3.5 Glycine Gly G Nonpolar Neutral −0.4 Histidine His H PolarPositive −3.2 Isoleucine Ile I Nonpolar Neutral 4.5 Leucine Leu LNonpolar Neutral 3.8 Lysine Lys K Polar Positive −3.9 Methionine Met MNonpolar Neutral 1.9 Phenylalanine Phe F Nonpolar Neutral 2.8 ProlinePro P Nonpolar Neutral −1.6 Serine Ser S Polar Neutral −0.8 ThreonineThr T Polar Neutral −0.7 Tryptophan Trp W Nonpolar Neutral −0.9 TyrosineTyr Y Polar Neutral −1.3 Valine Val V Nonpolar Neutral 4.2

A “homologous” sequence (e.g., nucleic acid sequence) includes asequence that is either identical or substantially similar to a knownreference sequence, such that it is, for example, at least 50%, at least55%, at least 60%, at least 65%, at least 70%, at least 75%, at least80%, at least 85%, at least 90%, at least 95%, at least 96%, at least97%, at least 98%, at least 99%, or 100% identical to the knownreference sequence. Homologous sequences can include, for example,orthologous sequence and paralogous sequences. Homologous genes, forexample, typically descend from a common ancestral DNA sequence, eitherthrough a speciation event (orthologous genes) or a genetic duplicationevent (paralogous genes). “Orthologous” genes include genes in differentspecies that evolved from a common ancestral gene by speciation.Orthologs typically retain the same function in the course of evolution.“Paralogous” genes include genes related by duplication within a genome.Paralogs can evolve new functions in the course of evolution.

The term “in vitro” includes artificial environments and to processes orreactions that occur within an artificial environment (e.g., a testtube). The term “in vivo” includes natural environments (e.g., a cell ororganism or body) and to processes or reactions that occur within anatural environment. The term “ex vivo” includes cells that have beenremoved from the body of an individual and to processes or reactionsthat occur within such cells.

Repair in response to double-strand breaks (DSBs) occurs principallythrough two conserved DNA repair pathways: homologous recombination (HR)and non-homologous end joining (NHEJ). See Kasparek & Humphrey (2011)Seminars in Cell & Dev. Biol. 22:886-897, herein incorporated byreference in its entirety for all purposes. Likewise, repair of a targetnucleic acid mediated by an exogenous donor nucleic acid can include anyprocess of exchange of genetic information between the twopolynucleotides.

The term “recombination” includes any process of exchange of geneticinformation between two polynucleotides and can occur by any mechanism.Recombination can occur via homology directed repair (HDR) or homologousrecombination (HR). HDR or HR includes a form of nucleic acid repairthat can require nucleotide sequence homology, uses a “donor” moleculeas a template for repair of a “target” molecule (i.e., the one thatexperienced the double-strand break), and leads to transfer of geneticinformation from the donor to target. Without wishing to be bound by anyparticular theory, such transfer can involve mismatch correction ofheteroduplex DNA that forms between the broken target and the donor,and/or synthesis-dependent strand annealing, in which the donor is usedto resynthesize genetic information that will become part of the target,and/or related processes. In some cases, the donor polynucleotide, aportion of the donor polynucleotide, a copy of the donor polynucleotide,or a portion of a copy of the donor polynucleotide integrates into thetarget DNA. See Wang et al. (2013) Cell 153:910-918; Mandalos et al.(2012) PLOS ONE 7:e45768:1-9; and Wang et al. (2013) Nat Biotechnol.31:530-532, each of which is herein incorporated by reference in itsentirety for all purposes.

Compositions or methods “comprising” or “including” one or more recitedelements may include other elements not specifically recited. Forexample, a composition that “comprises” or “includes” a protein maycontain the protein alone or in combination with other ingredients. Thetransitional phrase “consisting essentially of” means that the scope ofa claim is to be interpreted to encompass the specified elements recitedin the claim and those that do not materially affect the basic and novelcharacteristic(s) of the claimed invention. Thus, the term “consistingessentially of” when used in a claim of this invention is not intendedto be interpreted to be equivalent to “comprising.”

“Optional” or “optionally” means that the subsequently described eventor circumstance may or may not occur and that the description includesinstances in which the event or circumstance occurs and instances inwhich it does not.

Designation of a range of values includes all integers within ordefining the range, and all subranges defined by integers within therange.

Unless otherwise apparent from the context, the term “about” encompassesvalues within a standard margin of error of measurement (e.g., SEM) of astated value.

The term “and/or” refers to and encompasses any and all possiblecombinations of one or more of the associated listed items, as well asthe lack of combinations when interpreted in the alternative (“or”).

The term “or” refers to any one member of a particular list and alsoincludes any combination of members of that list.

The singular forms of the articles “a,” “an,” and “the” include pluralreferences unless the context clearly dictates otherwise. For example,the term “a protein” or “at least one protein” can include a pluralityof proteins, including mixtures thereof.

Statistically significant means p≤0.05.

DETAILED DESCRIPTION

I. Overview

Provided herein are methods for introducing a scarless targeted geneticmodification into a preexisting targeting vector. The methods can usecombinations of bacterial homologous recombination (BHR) and in vitroassembly methods (either intramolecular or intermolecular) to introducesuch targeted genetic modifications into a targeting vector in ascarless manner. The term scarless refers to the fact that no changes orundesired sequences are introduced into assembled DNA by the reactions.The combined sequence will correspond to the exact sequence desired withno changes or artefacts being introduced by the BHR or in vitro assemblyprocedures.

One of the most effective approaches for determining gene functioninvolves deliberately engineering gene mutations in mouse embryonic stem(ES) cells (or other non-human animal ES cells), and then generatingmice (or other non-human animals) harboring the corresponding geneticchanges. The two limiting steps are the generation of gene targetingvectors and the subsequent selection of rare ES cell clones in which thetargeting vector has correctly altered the gene. To produce a desiredgenetic alteration in ES cells, one must first introduce the alterationinto a targeting vector that is subsequently used to replace the nativegene in ES cells by homologous recombination.

Scarless DNA construction is of particular importance when creatingtransgenic animal lines, as the scars produced by restriction sites orother manipulations can negatively impact gene expression if they landin a region important for regulation. Targeting the mammalian genomeoften requires construction of large targeting vectors with long DNAarms to direct homologous recombination, as well as antibioticresistance cassettes for selection of embryonic stem cell clones.Correctly targeted clones often contain multiple scars necessary forconstruction of the vector and the resistance cassette itself. Even withself-deleting cassette technology, it is often not possible to avoidleaving exogenous sequence “scars” behind in modified loci. See, e.g.,FIGS. 4A-4B. Such scars can affect faithful expression of the targetedlocus or even the expression of neighboring genes. As animal modelsbecome more complex, more modifications may be added on top of existingones, such as human disease-causing mutations on humanized alleles. Theadditional changes can then add even more scars and another selectioncassette to an already highly engineered mouse locus, increasing thelikelihood that expression will be altered and the mouse model will notbe faithful. In addition, adding a new cassette to a vector alreadycontaining one can become complicated due to undesired recombinationbetween shared cassette elements such as promoters and poly(A) signals,even if the two cassettes encode different selections. However, suchselection cassettes are important so that time and resources do not haveto be wasted screening thousands of ES cell clones for a desiredmodification.

Alternatively, using the initial targeting vector to create and screenmodified ES cells comprising the modification from the initial targetingvector and then re-targeting those cells with a second targeting vector(e.g., ssODN) to make a second modification to the already targetedlocus is time-consuming, and re-targeting (e.g., with ssODNs) can leadto undesired modifications such as undesired insertions, undesireddeletions, undesired point mutation, or no targeting coupled with atransgenic insertion elsewhere in the genome.

The methods disclosed herein provide efficient and scarless methods formaking modifications to preexisting targeting vectors at the stage ofpreparing the targeting vector instead of having to create and screen EScells comprising the initial preexisting targeting vector, and thenre-targeting those cells to make a second modification to the alreadytargeted locus.

II. Scarless Introduction of a Targeted Modification into a TargetingVector Via Bacterial Homologous Recombination and Intramolecular InVitro Assembly

Some methods disclosed herein for scarless introduction of a targetedgenetic modification into a preexisting targeting vector take advantagesof in vitro assembly methods for intramolecular assembly. As oneexample, such methods can comprise performing bacterial homologousrecombination between the preexisting targeting vector and amodification cassette in a population of bacterial cells. Themodification cassette can comprise an insert nucleic acid flanked by a5′ homology arm corresponding to a 5′ target sequence in the preexistingtargeting vector and a 3′ homology arm corresponding to a 3′ targetsequence in the preexisting vector. The insert nucleic acid can comprisea selection cassette flanked by target sites for one or more nucleaseagents (e.g., rare-cutting nuclease agents) and repeat sequences. Forexample, the insert nucleic acid can comprise from 5′ to 3′: (1) a firstrepeat sequence; (2) a first target site for a first nuclease agent; (3)a selection cassette; (4) a second target site for a second nucleaseagent; and (5) a second repeat sequence.

The preexisting targeting vector can be any type of targeting vector ofany size. In a specific example, the preexisting targeting vector is alarge targeting vector (LTVEC) that is at least about 10 kb in length.In another example, it is at least about 100 kb in length. Targetingvectors and large targeting vectors are discussed in more detailelsewhere herein.

The modification cassette can be a linear nucleic acid or a circularnucleic acid, it can be a single-stranded nucleic acid or adouble-stranded nucleic acid, and it can comprise deoxyribonucleic acid(DNA) or ribonucleic acid (RNA). In one specific example, themodification cassette is a linear, double-stranded DNA.

The homology arms in the modification cassette are referred to herein as5′ and 3′ (i.e., upstream and downstream) homology arms. Thisterminology relates to the relative position of the homology arms to thenucleic acid insert within modification cassette. The 5′ and 3′ homologyarms correspond to regions within the preexisting targeting vector to bemodified, which are referred to herein as “5′ target sequence” and “3′target sequence,” respectively.

A homology arm and a target sequence “correspond” or are “corresponding”to one another when the two regions share a sufficient level of sequenceidentity to one another to act as substrates for a homologousrecombination reaction (e.g., bacterial homologous recombination). Theterm “homology” includes DNA sequences that are either identical orshare sequence identity to a corresponding sequence. The sequenceidentity between a given target sequence and the corresponding homologyarm found in the exogenous repair template can be any degree of sequenceidentity that allows for homologous recombination to occur. For example,the amount of sequence identity shared by the homology arm of theexogenous repair template (or a fragment thereof) and the targetsequence (or a fragment thereof) can be at least 50%, 55%, 60%, 65%,70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%,92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity, suchthat the sequences undergo homologous recombination. Moreover, acorresponding region of homology between the homology arm and thecorresponding target sequence can be of any length that is sufficient topromote homologous recombination. For example, the homology arms can beany size suitable for bacterial homologous recombination. For example,the homology arms can be at least about 35 nucleotides, at least about40 nucleotides, at least about 50 nucleotides, at least about 60nucleotides, at least about 70 nucleotides, at least about 80nucleotides, at least about 90 nucleotides, at least about 100nucleotides. For example, the homology arms can be between about 35nucleotides and 500 nucleotides, between about 75 nucleotides and about500 nucleotides, or between about 50 nucleotides and about 200nucleotides (e.g., about 100 nucleotides). As another example, homologyarms can be between about 35 nucleotides to about 2.5 kb in length, arebetween about 35 nucleotides to about 1.5 kb in length, or are betweenabout 35 to about 500 nucleotides in length. For example, a givenhomology arm (or each of the homology arms) and/or corresponding targetsequence can comprise corresponding regions of homology that are betweenabout 35 to about 40, about 40 to about 50, about 50 to about 60, about60 to about 70, about 70 to about 80, about 80 to about 90, about 90 toabout 100, about 100 to about 150, about 150 to about 200, about 200 toabout 250, about 250 to about 300, about 300 to about 350, about 350 toabout 400, about 400 to about 450, or about 450 to about 500 nucleotidesin length, such that the homology arms have sufficient homology toundergo homologous recombination with the corresponding target sequenceswithin the target nucleic acid. Alternatively, a given homology arm (oreach homology arm) and/or corresponding target sequence can comprisecorresponding regions of homology that are between about 0.5 kb to about1 kb, about 1 kb to about 1.5 kb, about 1.5 kb to about 2 kb, or about 2kb to about 2.5 kb in length. For example, the homology arms can each beabout 100 nucleotides in length. The homology arms can be symmetrical(each about the same size in length), or they can be asymmetrical (onelonger than the other).

The modification cassette can be of any length. For example, amodification cassette can be from about 10 kb to about 400 kb, fromabout 20 kb to about 400 kb, from about 20 kb to about 30 kb, from about30 kb to about 40 kb, from about 40 kb to about 50 kb, from about 50 kbto about 75 kb, from about 75 kb to about 100 kb, from about 100 kb to125 kb, from about 125 kb to about 150 kb, from about 150 kb to about175 kb, about 175 kb to about 200 kb, from about 200 kb to about 225 kb,from about 225 kb to about 250 kb, from about 250 kb to about 275 kb orfrom about 275 kb to about 300 kb, from about 200 kb to about 300 kb,from about 300 kb to about 350 kb, or from about 350 kb to about 400 kb.In one example, a modification cassette can be at least about 100 kb or100 kb in length. A modification cassette can also be from about 50 kbto about 500 kb, from about 100 kb to about 125 kb, from about 300 kb toabout 325 kb, from about 325 kb to about 350 kb, from about 350 kb toabout 375 kb, from about 375 kb to about 400 kb, from about 400 kb toabout 425 kb, from about 425 kb to about 450 kb, from about 450 kb toabout 475 kb, or from about 475 kb to about 500 kb. Alternatively, amodification cassette can be at least 10 kb, at least 15 kb, at least 20kb, at least 30 kb, at least 40 kb, at least 50 kb, at least 60 kb, atleast 70 kb, at least 80 kb, at least 90 kb, at least 100 kb, at least150 kb, at least 200 kb, at least 250 kb, at least 300 kb, at least 350kb, at least 400 kb, at least 450 kb, or at least 500 kb or greater. Inone example, the modification cassette is between about 1 kb and about15 kb in length or between about 1 kb and about 10 kb in length (e.g.,about 1.2 kb, about 5 kb, about 8 kb, or about 15 kb).

The modification cassette can comprise the targeted geneticmodification. For example, the targeted genetic modification can be inthe 5′ homology arm or the 3′ homology arm (e.g., a small modificationsuch as a point mutation or a small deletion, insertion, or replacementthat will not negatively affect the ability of the homology arm torecombine with the target sequence). Alternatively, the targeted geneticmodification can be in the insert nucleic acid (e.g., when the targetedgenetic modification is an insertion or a replacement). If the onlytargeted genetic modification is a deletion, then the 5′ homology armand 3′ homology arm can be designed to target 5′ and 3′ targetsequences, respectively, that flank the sequence targeted for deletionin the preexisting targeting vector. As one example, the targetedgenetic modification can be in the first repeat sequence and/or in thesecond repeat sequence in the insert nucleic acid. Types of possibletargeted genetic modifications are disclosed in more detail elsewhereherein. Some examples include point mutations, deletions, insertions,replacements, or combinations thereof.

The first and second repeat sequences in the modification cassette canbe identical to each other. The repeat sequence can be identical to asequence in the preexisting targeting vector. Alternatively, in the casethat the targeted genetic modification comprises an insertion (e.g., aninsertion alone, or an insertion in combination with a deletion (i.e.,replacement)), the repeat sequence can be identical to the 5′ end or the3′ end of the insertion.

The repeat sequence can be of any suitable size for subsequent assemblybetween the first and second repeat sequences in an in vitro assemblyreaction. As one example, the repeat sequence can comprise at leastabout 20 nucleotides, at least about 30 nucleotides, at least about 40nucleotides, or at least about 50 nucleotides. As another example, therepeat sequence can have a length of between about 20 nucleotides andabout 100 nucleotides, between about 20 nucleotides and about 90nucleotides, between about 0 nucleotides and about 80 nucleotides,between about 20 nucleotides and about 70 nucleotides, between about 20nucleotides and about 60 nucleotide, between about 20 nucleotides andabout 50 nucleotides, between about 20 nucleotides and about 40nucleotides, between about 30 nucleotides and about 60 nucleotides, orbetween about 40 nucleotides and about 50 nucleotides. In a specificexample, the repeat sequence can have a length of between about 40nucleotides and about 50 nucleotides (e.g., about 40 nucleotides orabout 50 nucleotides).

Following bacterial homologous recombination, bacterial cells comprisinga modified targeting vector comprising the selection cassette (andcomprising the targeted genetic modification) can be selected. Examplesof selection cassettes and selection methods are disclosed in moredetail elsewhere herein. In a specific example, the selection cassetteimparts resistance to an antibiotic. For example, it can impartresistance to any one of ampicillin, chloramphenicol, tetracycline,kanamycin, spectinomycin, streptomycin, carbenicillin, bleomycin,erythromycin, or polymyxin B. In some methods, the preexisting targetingvector also comprises a second selection cassette. The second selectioncassette can, for example, also impart resistance to an antibiotic. Theselection cassette in the modification cassette and the second selectioncassette in the preexisting targeting vector can each impart resistanceto a different antibiotic. For example, the selection cassette in themodification cassette can impart resistance to a first antibiotic, andthe second selection cassette in the preexisting targeting vector canimpart resistance to a second, different antibiotic. In some methods,the second selection cassette can allow for selection in both bacterialcells and eukaryotic or mammalian cells.

Following selection, the first target site in the modified targetingvector can be cleaved with the first nuclease agent, and the secondtarget site in the modified targeting vector can be cleaved with thesecond nuclease agent to remove the selection cassette and expose thefirst repeat sequence and the second repeat sequence in the modifiedtargeting vector. For example, this step can be done in vitro. As anexample, DNA can be isolated from the bacterial cells followingbacterial homologous recombination and selection, after which the firsttarget site in the modified targeting vector can be cleaved with thefirst nuclease agent in vitro, and the second target site in themodified targeting vector can be cleaved with the second nuclease agentin vitro to remove the selection cassette and expose the first repeatsequence and the second repeat sequence in the modified targetingvector.

The first nuclease agent and/or the second nuclease agent can be arare-cutting nuclease agent as described elsewhere herein. For example,in some methods, the first target site and/or the second target site arenot present in the preexisting targeting vector. The first and secondtarget sites can be different, or the first target site can be identicalto the second target site, and the first nuclease agent can be identicalto the second nuclease agent. The first nuclease agent and/or the secondnuclease agent can create a blunt end, a 5′ overhang, or a 3′ overhang.In one example, the first nuclease agent and/or the second nucleaseagent creates a 3′ overhang.

In one specific example, the first nuclease agent and/or the secondnuclease agent is a restriction enzyme or a rare-cutting restrictionenzyme. Examples of rare-cutting restriction enzymes are disclosedelsewhere herein but can include, for example, NotI, XmaIII, SstII,Sall, NruI, NheI, Nb.BbvCI, BbvCI, AscI, AsiSI, FseI, PacI, PmeI, SbfI,SgrAI, SwaI, BspQI, SapI, SfiI, CspCI, AbsI, CciNI, FspAI, MauBI, MreI,MssI, PalAI, RgaI, RigI, SdaI, SfaAI, SgfI, SgrDI, SgsI, SmiI, SrfI,Sse2321, Sse83871, LguI, PciSI, AarI, AjuI, AloI, BarI, PpiI, and PsrI.

In another specific example, the first nuclease agent and/or the secondnuclease agent can be an engineered nuclease agent. For example, thenuclease agent can be a Clustered Regularly Interspaced ShortPalindromic Repeats (CRISPR)-associated (Cas) protein and a guide RNA(gRNA) (e.g., Cas9 and a gRNA comprising a CRISPR RNA (crRNA) and atrans-activating CRISPR RNA (tracrRNA)), a zinc finger nuclease (ZFN), aTranscription Activator-Like Effector Nuclease (TALEN), or an engineeredmeganuclease

Following cleavage/digestion, the exposed first repeat sequence can beassembled with the exposed second repeat sequence in an intramolecularin vitro assembly reaction to generate the targeting vector comprisingthe scarless targeted genetic modification. For example, in some suchmethods, neither the first target site for the first nuclease agent northe second target site for the second nuclease agent are present in thetargeting vector comprising the scarless targeted genetic modification(i.e., following in vitro assembly). Likewise, in some such methods,only a single copy of the repeat sequence is present in the targetingvector comprising the scarless targeted genetic modification (i.e.,following in vitro assembly).

Any suitable in vitro assembly method can be used. In one specificexample, the in vitro assembly step can comprise incubating the modifiedtargeting vector with an exonuclease, a DNA polymerase, and a DNAligase. For example, the in vitro assembly method can comprisecontacting the modified targeting vector with an exonuclease to exposecomplementary sequences between the first repeat sequence and the secondrepeat sequence, annealing the exposed complementary sequences,extending the 3′ ends of the annealed complementary sequences, andligating the annealed complementary sequences. Examples of in vitroassembly methods are discussed in more detail elsewhere herein.

In some methods, to reduce background, the vector produced by the invitro assembly can be treated with the first nuclease agent and/or thesecond nuclease agent to reduce background (e.g., by cleaving anytargeting vectors that did not successfully assemble and therefore stillcontain the target site for the first nuclease agent or the secondnuclease agent). Such a step can help verify that neither the firsttarget site for the first nuclease agent nor the second target site forthe second nuclease agent are present.

III. Scarless Introduction of a Targeted Modification into a TargetingVector Via Bacterial Homologous Recombination and Intermolecular InVitro Assembly

Other methods disclosed herein for scarless introduction of a targetedgenetic modification into a preexisting targeting vector take advantageof in vitro assembly methods for intermolecular assembly. As oneexample, such methods can comprise performing bacterial homologousrecombination between the preexisting targeting vector and a deletioncassette in a population of bacterial cells. The deletion cassette cancomprise an insert nucleic acid flanked by a 5′ homology armcorresponding to a 5′ target sequence in the preexisting targetingvector and a 3′ homology arm corresponding to a 3′ target sequence inthe preexisting vector. The 5′ target sequence and the 3′ targetsequence clan flank a region of the preexisting targeting vector intowhich the targeted genetic modification is to be introduced. The insertnucleic acid can comprise a selection cassette flanked by target sitesfor one or more nuclease agents (e.g., rare-cutting nuclease agents).For example, the insert nucleic acid can comprise from 5′ to 3′: (1) afirst target site for a first nuclease agent; (2) a selection cassette;and (3) a second target site for a second nuclease agent.

The preexisting targeting vector can be any type of targeting vector ofany size. In a specific example, the preexisting targeting vector is alarge targeting vector (LTVEC) that is at least about 10 kb in length.In another example, it is at least about 100 kb in length. Targetingvectors are discussed in more detail elsewhere herein.

The deletion cassette can be a linear nucleic acid or a circular nucleicacid, it can be a single-stranded nucleic acid or a double-strandednucleic acid, and it can comprise deoxyribonucleic acid (DNA) orribonucleic acid (RNA). In one specific example, the modificationcassette is a linear, double-stranded DNA.

The homology arms in the deletion cassette are referred to herein as 5′and 3′ (i.e., upstream and downstream) homology arms. This terminologyrelates to the relative position of the homology arms to the nucleicacid insert within deletion cassette. The 5′ and 3′ homology armscorrespond to regions within the preexisting targeting vector to bemodified, which are referred to herein as “5′ target sequence” and “3′target sequence,” respectively.

A homology arm and a target sequence “correspond” or are “corresponding”to one another when the two regions share a sufficient level of sequenceidentity to one another to act as substrates for a homologousrecombination reaction (e.g., bacterial homologous recombination). Theterm “homology” includes DNA sequences that are either identical orshare sequence identity to a corresponding sequence. The sequenceidentity between a given target sequence and the corresponding homologyarm found in the exogenous repair template can be any degree of sequenceidentity that allows for homologous recombination to occur. For example,the amount of sequence identity shared by the homology arm of theexogenous repair template (or a fragment thereof) and the targetsequence (or a fragment thereof) can be at least 50%, 55%, 60%, 65%,70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%,92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity, suchthat the sequences undergo homologous recombination. Moreover, acorresponding region of homology between the homology arm and thecorresponding target sequence can be of any length that is sufficient topromote homologous recombination. For example, the homology arms can beany size suitable for bacterial homologous recombination. For example,the homology arms can be at least about 35 nucleotides, at least about40 nucleotides, at least about 50 nucleotides, at least about 60nucleotides, at least about 70 nucleotides, at least about 80nucleotides, at least about 90 nucleotides, at least about 100nucleotides. For example, the homology arms can be between about 35nucleotides and 500 nucleotides, between about 75 nucleotides and about500 nucleotides, or between about 50 nucleotides and about 200nucleotides (e.g., about 100 nucleotides). As another example, homologyarms can be between about 35 nucleotides to about 2.5 kb in length, arebetween about 35 nucleotides to about 1.5 kb in length, or are betweenabout 35 to about 500 nucleotides in length. For example, a givenhomology arm (or each of the homology arms) and/or corresponding targetsequence can comprise corresponding regions of homology that are betweenabout 35 to about 40, about 40 to about 50, about 50 to about 60, about60 to about 70, about 70 to about 80, about 80 to about 90, about 90 toabout 100, about 100 to about 150, about 150 to about 200, about 200 toabout 250, about 250 to about 300, about 300 to about 350, about 350 toabout 400, about 400 to about 450, or about 450 to about 500 nucleotidesin length, such that the homology arms have sufficient homology toundergo homologous recombination with the corresponding target sequenceswithin the target nucleic acid. Alternatively, a given homology arm (oreach homology arm) and/or corresponding target sequence can comprisecorresponding regions of homology that are between about 0.5 kb to about1 kb, about 1 kb to about 1.5 kb, about 1.5 kb to about 2 kb, or about 2kb to about 2.5 kb in length. For example, the homology arms can each beabout 100 nucleotides in length. The homology arms can be symmetrical(each about the same size in length), or they can be asymmetrical (onelonger than the other).

The deletion cassette can be of any length. For example, a deletioncassette can be from about 10 kb to about 400 kb, from about 20 kb toabout 400 kb, from about 20 kb to about 30 kb, from about 30 kb to about40 kb, from about 40 kb to about 50 kb, from about 50 kb to about 75 kb,from about 75 kb to about 100 kb, from about 100 kb to 125 kb, fromabout 125 kb to about 150 kb, from about 150 kb to about 175 kb, about175 kb to about 200 kb, from about 200 kb to about 225 kb, from about225 kb to about 250 kb, from about 250 kb to about 275 kb or from about275 kb to about 300 kb, from about 200 kb to about 300 kb, from about300 kb to about 350 kb, or from about 350 kb to about 400 kb. In oneexample, a deletion cassette can be at least about 100 kb or 100 kb inlength. A deletion cassette can also be from about 50 kb to about 500kb, from about 100 kb to about 125 kb, from about 300 kb to about 325kb, from about 325 kb to about 350 kb, from about 350 kb to about 375kb, from about 375 kb to about 400 kb, from about 400 kb to about 425kb, from about 425 kb to about 450 kb, from about 450 kb to about 475kb, or from about 475 kb to about 500 kb. Alternatively, a deletioncassette can be at least 10 kb, at least 15 kb, at least 20 kb, at least30 kb, at least 40 kb, at least 50 kb, at least 60 kb, at least 70 kb,at least 80 kb, at least 90 kb, at least 100 kb, at least 150 kb, atleast 200 kb, at least 250 kb, at least 300 kb, at least 350 kb, atleast 400 kb, at least 450 kb, or at least 500 kb or greater. In oneexample, the deletion cassette is between about 1 kb and about 15 kb inlength or between about 1 kb and about 10 kb in length (e.g., about 1.2kb, about 5 kb, about 8 kb, or about 15 kb).

Following bacterial homologous recombination, bacterial cells comprisinga modified targeting vector comprising the selection cassette can beselected. Examples of selection cassettes and selection methods aredisclosed in more detail elsewhere herein. In a specific example, theselection cassette imparts resistance to an antibiotic. For example, itcan impart resistance to any one of ampicillin, chloramphenicol,tetracycline, kanamycin, spectinomycin, streptomycin, carbenicillin,bleomycin, erythromycin, or polymyxin B. In some methods, thepreexisting targeting vector also comprises a second selection cassette.The second selection cassette can, for example, also impart resistanceto an antibiotic. The selection cassette in the deletion cassette andthe second selection cassette in the preexisting targeting vector caneach impart resistance to a different antibiotic. For example, theselection cassette in the deletion cassette can impart resistance to afirst antibiotic, and the second selection cassette in the preexistingtargeting vector can impart resistance to a second, differentantibiotic. In some methods, the second selection cassette can allow forselection in both bacterial cells and eukaryotic or mammalian cells.

Following selection, the first target site in the modified targetingvector can be cleaved with the first nuclease agent, and the secondtarget site in the modified targeting vector can be cleaved with thesecond nuclease agent to remove the selection cassette and expose anupstream end sequence and a downstream end sequence in the modifiedtargeting vector. For example, this step can be done in vitro. As anexample, DNA can be isolated from the bacterial cells followingbacterial homologous recombination and selection, after which the firsttarget site in the modified targeting vector can be cleaved with thefirst nuclease agent in vitro, and the second target site in themodified targeting vector can be cleaved with the second nuclease agentin vitro to remove the selection cassette and expose the upstream endsequence and the downstream end sequence in the modified targetingvector.

The first nuclease agent and/or the second nuclease agent can be arare-cutting nuclease agent as described elsewhere herein. For example,in some methods, the first target site and/or the second target site arenot present in the preexisting targeting vector. The first and secondtarget sites can be different, or the first target site can be identicalto the second target site, and the first nuclease agent can be identicalto the second nuclease agent. The first nuclease agent and/or the secondnuclease agent can create a blunt end, a 5′ overhang, or a 3′ overhang.In one example, the first nuclease agent and/or the second nucleaseagent creates a 3′ overhang.

In one specific example, the first nuclease agent and/or the secondnuclease agent is a restriction enzyme or a rare-cutting restrictionenzyme. Examples of rare-cutting restriction enzymes are disclosedelsewhere herein but can include, for example, NotI, XmaIII, SstII,Sall, NruI, NheI, Nb.BbvCI, BbvCI, AscI, AsiSI, FseI, PacI, PmeI, SbfI,SgrAI, SwaI, BspQI, SapI, SfiI, CspCI, AbsI, CciNI, FspAI, MauBI, MreI,MssI, PalAI, RgaI, RigI, SdaI, SfaAI, SgfI, SgrDI, SgsI, SmiI, SrfI,Sse2321, Sse83871, LguI, PciSI, AarI, AjuI, AloI, BarI, PpiI, and PsrI.

In another specific example, the first nuclease agent and/or the secondnuclease agent can be an engineered nuclease agent. For example, thenuclease agent can be a Clustered Regularly Interspaced ShortPalindromic Repeats (CRISPR)-associated (Cas) protein and a guide RNA(gRNA) (e.g., Cas9 and a gRNA comprising a CRISPR RNA (crRNA) and atrans-activating CRISPR RNA (tracrRNA)), a zinc finger nuclease (ZFN), aTranscription Activator-Like Effector Nuclease (TALEN), or an engineeredmeganuclease

Following cleavage/digestion, the cleaved targeting vector can beassembled in an in vitro intermolecular assembly reaction with amodification cassette comprising the targeted genetic modificationflanked by an upstream end sequence overlapping the upstream endsequence in the modified targeting vector and a downstream end sequenceoverlapping the downstream end sequence in the modified targeting vectorto generate the targeting vector comprising the scarless targetedgenetic modification. For example, in some such methods, neither thefirst target site for the first nuclease agent nor the second targetsite for the second nuclease agent are present in the targeting vectorcomprising the scarless targeted genetic modification.

Any suitable in vitro assembly method can be used. In one specificexample, the in vitro assembly step can comprise incubating the cleavedtargeting vector and the modification cassette with an exonuclease, aDNA polymerase, and a DNA ligase. For example, the in vitro assemblymethod can comprise contacting the cleaved targeting vector and themodification cassette with an exonuclease to expose complementarysequences between the end sequences in the modified targeting vector andthe end sequences in the modification cassette, annealing the exposedcomplementary sequences, extending the 3′ ends of the annealedcomplementary sequences, and ligating the annealed complementarysequences.

The modification cassette can be a linear nucleic acid or a circularnucleic acid, it can be a single-stranded nucleic acid or adouble-stranded nucleic acid, and it can comprise deoxyribonucleic acid(DNA) or ribonucleic acid (RNA). In one specific example, themodification cassette is a linear, double-stranded DNA.

The modification cassette can be of any length. For example, amodification cassette can be from about 10 kb to about 400 kb, fromabout 20 kb to about 400 kb, from about 20 kb to about 30 kb, from about30 kb to about 40 kb, from about 40 kb to about 50 kb, from about 50 kbto about 75 kb, from about 75 kb to about 100 kb, from about 100 kb to125 kb, from about 125 kb to about 150 kb, from about 150 kb to about175 kb, about 175 kb to about 200 kb, from about 200 kb to about 225 kb,from about 225 kb to about 250 kb, from about 250 kb to about 275 kb orfrom about 275 kb to about 300 kb, from about 200 kb to about 300 kb,from about 300 kb to about 350 kb, or from about 350 kb to about 400 kb.In one example, a modification cassette can be at least about 100 kb or100 kb in length. A modification cassette can also be from about 50 kbto about 500 kb, from about 100 kb to about 125 kb, from about 300 kb toabout 325 kb, from about 325 kb to about 350 kb, from about 350 kb toabout 375 kb, from about 375 kb to about 400 kb, from about 400 kb toabout 425 kb, from about 425 kb to about 450 kb, from about 450 kb toabout 475 kb, or from about 475 kb to about 500 kb. Alternatively, amodification cassette can be at least 10 kb, at least 15 kb, at least 20kb, at least 30 kb, at least 40 kb, at least 50 kb, at least 60 kb, atleast 70 kb, at least 80 kb, at least 90 kb, at least 100 kb, at least150 kb, at least 200 kb, at least 250 kb, at least 300 kb, at least 350kb, at least 400 kb, at least 450 kb, or at least 500 kb or greater. Inone specific example, the modification cassette is between about 400 bpand about 2 kb in length. In another example, the modification cassetteis between about 1 kb and about 15 kb in length or between about 1 kband about 10 kb in length (e.g., about 1.2 kb, about 5 kb, about 8 kb,or about 15 kb). In a specific example, the modification cassette is atleast about 200 nucleotides in length. In another specific example, themodification cassette is a size that cannot be directly synthesized orgenerated by polymerase chain reaction. For example, the modificationcassette can be at least about 5 kb, at least about 10 kb, at leastabout 15 kb, at least about 20 kb, at least about 25 kb, or at leastabout 30 kb in length.

The length of overlap between the upstream end sequence in themodification cassette and the upstream end sequence in the modifiedtargeting vector and/or the length of the overlap between the downstreamend sequence in the modification cassette and the downstream endsequence in the modified targeting vector can be any suitable length foran in vitro assembly reaction. As one example, the length of overlap cancomprise at least about 20 nucleotides, at least about 30 nucleotides,at least about 40 nucleotides, or at least about 50 nucleotides. Asanother example, the length of overlap can be between about 20nucleotides and about 100 nucleotides, between about 20 nucleotides andabout 90 nucleotides, between about 0 nucleotides and about 80nucleotides, between about 20 nucleotides and about 70 nucleotides,between about 20 nucleotides and about 60 nucleotide, between about 20nucleotides and about 50 nucleotides, between about 20 nucleotides andabout 40 nucleotides, between about 30 nucleotides and about 60nucleotides, or between about 40 nucleotides and about 50 nucleotides.In a specific example, the length of overlap can be between about 40nucleotides and about 50 nucleotides (e.g., about 40 nucleotides orabout 50 nucleotides).

The modification cassette can comprise the targeted geneticmodification. Types of targeted genetic modifications are disclosed inmore detail elsewhere herein. Some examples include point mutations,deletions, insertions, replacements, or combinations thereof.

In some methods, to reduce background, the vector produced by the invitro assembly can be treated with the first nuclease agent and/or thesecond nuclease agent to reduce background (e.g., by cleaving anytargeting vectors that did not successfully assemble and therefore stillcontained the target site for the first nuclease agent or the secondnuclease agent). Such a step can help verify that neither the firsttarget site for the first nuclease agent nor the second target site forthe second nuclease agent are present.

IV. Bacterial Homologous Recombination

Any suitable bacterial homologous recombination (BHR) method can be usedin the methods disclosed herein. Bacterial homologous recombinationinvolves the transient and controlled expression of genes that mediatehomologous recombination in bacterial cells such as Escherichia coli,thereby allowing the bacteria to mediate recombination between amodification cassette and a targeting vector (e.g., large targetingvector) sharing short homologous stretches. See, e.g., US 2004/0018626and Valenzuela et al. (2003) Nat. Biotechnol. 21(6):652-659, each ofwhich is herein incorporated by reference in its entirety.

The short homologous stretches can comprise an upstream homology regionand a downstream homology region. The homology regions can be any sizesuitable for bacterial homologous recombination. For example, thehomology regions can be any size suitable for bacterial homologousrecombination. For example, the homology regions can be at least about35 nucleotides, at least about 40 nucleotides, at least about 50nucleotides, at least about 60 nucleotides, at least about 70nucleotides, at least about 80 nucleotides, at least about 90nucleotides, at least about 100 nucleotides. For example, the homologyregions can be between about 35 nucleotides and 500 nucleotides, betweenabout 75 nucleotides and about 500 nucleotides, or between about 50nucleotides and about 200 nucleotides (e.g., about 100 nucleotides). Asanother example, homology regions can be between about 35 nucleotides toabout 2.5 kb in length, are between about 35 nucleotides to about 1.5 kbin length, or are between about 35 to about 500 nucleotides in length.For example, a homology region can be between about 35 to about 40,about 40 to about 50, about 50 to about 60, about 60 to about 70, about70 to about 80, about 80 to about 90, about 90 to about 100, about 100to about 150, about 150 to about 200, about 200 to about 250, about 250to about 300, about 300 to about 350, about 350 to about 400, about 400to about 450, or about 450 to about 500 nucleotides in length.Alternatively, a given homology region can be between about 0.5 kb toabout 1 kb, about 1 kb to about 1.5 kb, about 1.5 kb to about 2 kb, orabout 2 kb to about 2.5 kb in length. For example, the homology regioncan be about 100 nucleotides in length.

The technique of modifying a targeting vector using bacterial homologousrecombination can be performed in a variety of systems (see, e.g., Yanget al. (1997) Nat. Biotechnol. 15:859-65; Muyrers et al. (1999) NucleicAcids Res. 27:1555-1557; Angrand et al. (1999) Nucleic Acids Res.,27:e16; Narayanan et al. (1999) Gene Ther., 6:442-447; and Yu et al.(2000) Proc. Natl. Acad. Sci. U.S.A. 97:5978-5983, each of which isherein incorporated by reference in its entirety for all purposes). Oneexample is ET cloning (Zhang et al. (1998) Nat. Genet. 20:123-128 andNarayanan et al. (1999) Gene Ther., 6:442-447, each of which is hereinincorporated by reference in its entirety for all purposes) andvariations of this technology (Yu et al. (2000) Proc. Natl. Acad. Sci.U.S.A. 97:5978-5983, herein incorporated by reference in its entiretyfor all purposes). ET refers to the recE and recT proteins that carryout the homologous recombination reaction. RecE is an exonuclease thattrims one strand of linear double-stranded DNA 5′ to 3′, thus leavingbehind a linear double-stranded fragment with a 3′ single-strandedoverhang. This single-stranded overhang is coated by recT protein, whichhas single-stranded DNA (ssDNA) binding activity. ET cloning isperformed using E. coli that transiently express the E. coli geneproducts of recE and recT and the bacteriophage lambda (λ) protein λgam.The λgam protein is required for protecting the donor DNA fragment fromdegradation by the recBC exonuclease system and it is required forefficient ET-cloning in recBC⁺ hosts such as the frequently used E. colistrain DH10b.

V. In Vitro Assembly

Any in vitro assembly method that can be used to assemble at least twonucleic acid or at least two ends of a single nucleic acid underconditions effective to join the DNA molecules to form a substantiallyintact DNA molecule can be used in the methods described herein. Somenon-limiting examples of in vitro assembly methods include standardassembly using restriction enzymes, in-fusion assembly, sequence andligase independent cloning (SLIC), Gibson assembly, and Golden Gateassembly. See, e.g., Lee at al. (2013) Mol. Cells 35:359-370, hereinincorporated by reference in its entirety for all purposes.

One example of a suitable in vitro assembly method is an isothermal,single-reaction method for assembling overlapping DNA molecules by theconcerted action of an exonuclease (e.g., a 5′ exonuclease), a DNApolymerase, and a DNA ligase. Nucleic acids having overlapping ends (ora single, linear nucleic acid with overlapping ends) can be combinedwith a ligase, an exonuclease, and a DNA polymerase. For example, twoadjacent DNA fragments sharing terminal sequence overlaps can be joinedinto a covalently sealed molecule in a one-step isothermal reaction. Ina specific example, two or more DNA molecules to be assembled can becontacted in vitro in a single vessel with: (a) an isolatednon-thermostable 5′-to-3′ exonuclease that lacks 3′ exonuclease activity(e.g., a non-processive exonuclease that chews back the ends of thedouble-stranded DNA molecules to expose single-stranded overhangscomprising the regions of overlap); (b) a crowding agent (which, amongother functions, can accelerate nucleic acid annealing, so that thesingle-stranded overhangs are annealed (hybridized) specifically); (c)an isolated thermostable non-strand-displacing DNA polymerase with 3′exonuclease activity, or a mixture of said DNA polymerase with a secondDNA polymerase that lacks 3′ exonuclease activity (to fill in remainingsingle-stranded gaps in the annealed molecules, by extending the 3′ endsof the annealed regions); (d) an isolated thermostable ligase (whichseals (ligates) the nicks thus formed); (e) a mixture of dNTPs; and (f)a suitable buffer under conditions that are effective for joining thetwo or more DNA molecules to form a first assembled dsDNA molecule in aone-step reaction. For single-stranded molecules, the exonuclease maybe, but need not be, omitted. In a specific example, T5 exonucleaseremoves nucleotides from the 5′ ends of the double-stranded DNAmolecules, complementary single-stranded DNA overhangs are annealed, andPhusion DNA polymerase fills the gaps, and Taq DNA ligase seals thenicks. See, e.g., US 2010/0035768, US 2015/0376628, WO 2015/200334, andGibson et al. (2009) Nat. Methods 6(5):343-345, each of which is hereinincorporated by reference in its entirety for all purposes.

First and second single stranded nucleic acids have overlapping endswhen their respective ends are complementary to one another. First andsecond double stranded nucleic acids have overlapping ends when a 5′ endof a strand of the first nucleic acid is complementary to the 3′ end ofa strand of the second nucleic acid and vice versa. For example, fordouble stranded overlapping end sequences, the strands of one nucleicacid can have at least about 80%, at least about 85%, at least about90%, at least about 95%, at least about 96%, at least about 97%, atleast about 98%, at least about 99%, or 100% identity to a correspondingstrand of the other nucleic acid. In methods disclosed herein, the 5′end of a strand of a dsDNA molecule to be assembled shares overlappingend sequences with the 3′ end of a strand of the other dsDNA molecule.The term overlapping end sequences includes both strands of a dsDNAmolecule. Thus, one strand from the overlapping region can hybridizespecifically to its complementary strand when the complementary regionsof the overlapping sequences are presented in single-stranded overhangsfrom the 5′ and 3′ ends of the two polynucleotides to be assembled. Anexonuclease can be used to remove nucleotides from the 5′ or 3′ end tocreate overhanging end sequences.

The length of the overlapping region can be of sufficient length suchthat the region occurs only once within any of the nucleic acids beingassembled. Thus, other polynucleotides are prevented from annealing withthe end sequences, and the assembly can be specific for the targetnucleic acids. As one example, the length of overlap can comprise atleast about 20 nucleotides, at least about 30 nucleotides, at leastabout 40 nucleotides, or at least about 50 nucleotides. As anotherexample, the length of overlap can be between about 20 nucleotides andabout 100 nucleotides, between about 20 nucleotides and about 90nucleotides, between about 0 nucleotides and about 80 nucleotides,between about 20 nucleotides and about 70 nucleotides, between about 20nucleotides and about 60 nucleotide, between about 20 nucleotides andabout 50 nucleotides, between about 20 nucleotides and about 40nucleotides, between about 30 nucleotides and about 60 nucleotides, orbetween about 40 nucleotides and about 50 nucleotides. In a specificexample, the length of overlap can be between about 40 nucleotides andabout 50 nucleotides (e.g., about 40 nucleotides or about 50nucleotides).

The overlapping sequences can be contacted with an exonuclease to exposecomplementary sequences (e.g., complementary single strand sequences)between the overlapping sequences. The exonuclease digestion can becarried out under conditions that are effective to remove (chew back) asufficient number of nucleotides to allow for specific annealing of theexposed single-stranded regions of complementarity. In general, aportion of the region of overlap or the entire region of overlap ischewed back, leaving overhangs which comprise a portion of the region ofoverlap or the entire region of overlap. In some methods, theexonuclease digestion may be carried out by a polymerase in the absenceof dNTPs (e.g., T5 DNA polymerase), whereas in other methods theexonuclease digestion may be carried out by an exonuclease in thepresence of dNTPs that lacks polymerase activity (e.g., exonucleaseIII).

Any of a variety of 5′-to-3′, double-strand specificexodeoxyribonucleases may be used to chew back the ends of nucleic acidsin the methods disclosed herein. The term 5′ exonuclease is sometimesused herein to refer to a 5′-to-3′ exodeoxyribonuclease. Anon-processive exonuclease refers to an exonuclease that degrades alimited number of (e.g., only a few) nucleotides during each DNA bindingevent. Digestion with a 5′ exonuclease produces 3′ single-strandedoverhangs in the DNA molecules. 5′ exonucleases used in in vitroassembly methods can lacks 3′ exonuclease activity, can generate 5′phosphate ends, and can initiate degradation from both 5′-phosphorylatedand unphosphorylated ends. Exonucleases used in the in vitro assemblymethods described herein can initiate digestion from the 5′ end of amolecule, whether it is a blunt end, or it has a small 5′ or 3′ recessedend. Suitable exonucleases are well known and include, for example,phage T5 exonuclease (phage T5 gene D15 product), phage lambdaexonuclease, RecE of Rac prophage, exonuclease VIII from E. coli, phageT7 exonuclease (phage T7 gene 6 product), or any of a variety of 5′exonuclease that are involved in homologous recombination reactions. Asone example, the exonuclease is T5 exonuclease or lambda exonuclease. Ina specific example, the exonuclease is T5 exonuclease. In anotherspecific example, the exonuclease is not phage T7 exonuclease.

In situations where the region of overlap is long, it may only benecessary to chew back a portion of the region, provided that thesingle-stranded overhangs thus generated are of sufficient length andbase content to anneal specifically under the conditions of thereaction. The term annealing specifically includes situations wherein aparticular pair of single-stranded overhangs will anneal preferentially(or exclusively) to one another, rather than to other single-strandedoverhangs (e.g., non-complementary overhangs) that are present in thereaction mixture. By preferentially is meant that at least about 95% ofthe overhangs will anneal to the complementary overhang. Generally, thehomologous regions of overlap (the single-stranded overhangs or theircomplements) contain identical sequences. However, partially identicalsequences may be used, provided that the single-stranded overhangs cananneal specifically under the conditions of the reactions.

Following the annealing of single stranded DNA (e.g., overhangs producedby the action of exonuclease when the DNA molecules to be joined aredsDNA or overhangs produced by creating nicks at different target siteson each strand), the single-stranded gaps left by the exonuclease can befilled in with a suitable, non-strand-displacing, DNA polymerase and thenicks thus formed can be sealed with a ligase. A non-strand-displacingDNA polymerase as used herein is a DNA polymerase that terminatessynthesis of DNA when it encounters DNA strands which lie in its path asit proceeds to copy a dsDNA molecule, or that degrades the encounteredDNA strands as it proceeds while concurrently filling in the gap thuscreated, thereby generating a moving nick (nick translation).

Following annealing of a single strand of a first polynucleotide to thecomplementary strand of a second polynucleotide, the 3′ end of the firstpolynucleotide can be extended based on the template of the secondpolynucleotide strand, and the 3′ end of the second polynucleotidestrand can be extended based on the template of the first polynucleotidestrand. By extending the complementary 3′ end of each polynucleotide,the polynucleotides can be assembled. Following assembly, nicks betweenthe extended 3′ end of a strand from one fragment and adjacent 5′ end ofa strand from the other fragment can be sealed by ligation. Morespecifically, the hydroxyl group of the extended 3′ end of the firstpolynucleotide can be ligated to the phosphate group of the 5′ end ofthe second polynucleotide, and the hydroxyl group of the extended 3′ endof the second polynucleotide can be ligated to the phosphate group ofthe 5′ end of the first polynucleotide.

The ligation reaction can be performed by any of a variety of suitablethermostable DNA ligases. Among suitable ligases are, for example, Taqligase, Ampligase Thermostable DNA ligase, or the thermostable ligasesdisclosed in U.S. Pat. No. 6,576,453, herein incorporated by referencein its entirety for all purposes.

A suitable amount of a crowding agent, such as PEG, in the reactionmixture can allow for, enhance, or facilitate molecular crowding. Such acrowding agent can allow components of the solution to come into closercontact with one another. For example, DNA molecules to be recombinedcan come into closer proximity; which can facilitate the annealing ofthe single-stranded overhangs. Suitable crowding agents are known andinclude a variety of well-known macromolecules, such as polymers such aspolyethylene glycol (PEG), Ficolls such as Ficoll 70, or dextrans suchas dextran 70.

Reaction components (such as salts, buffers, a suitable energy source(such as ATP or NAD), pH of the reaction mixture, and so forth) that arepresent in an assembly reaction mixture may not be optimal for theindividual enzymes (exonuclease, polymerase, and ligase) but can serveas a compromise that is effective for the entire set of reactions.

VI. Targeting Vectors and Large Targeting Vectors (LTVECs)

The targeting vectors used in the methods disclosed herein can be anysuitable targeting vector. The targeting vectors can comprisedeoxyribonucleic acid (DNA) or ribonucleic acid (RNA), they can besingle-stranded or double-stranded, and they can be in linear orcircular form. The targeting vectors can be a bacterial artificialchromosome (BAC), a modified BAC, or a fragment of a BAC. They cancomprise human DNA, rodent DNA (e.g., mouse DNA or rat DNA), syntheticDNA, or any combination thereof.

Some targeting vectors used in the methods disclosed herein are largetargeting vectors (LTVECs). LTVECs include targeting vectors thatcomprise homology arms that correspond to and are derived from nucleicacid sequences larger than those typically used by other approachesintended to perform homologous recombination in cells. LTVECs alsoinclude targeting vectors comprising nucleic acid inserts having nucleicacid sequences larger than those typically used by other approachesintended to perform homologous recombination in cells. For example,LTVECs make possible the modification of large loci that cannot beaccommodated by traditional plasmid-based targeting vectors because oftheir size limitations. For example, the targeted locus can be (i.e.,the 5′ and 3′ homology arms can correspond to) a locus of the cell thatis not targetable using a conventional method or that can be targetedonly incorrectly or only with significantly low efficiency in theabsence of a nick or double-strand break induced by a nuclease agent(e.g., a Cas protein). Examples of LTVECs include vectors derived from abacterial artificial chromosome (BAC), a human artificial chromosome, ora yeast artificial chromosome (YAC). Non-limiting examples of LTVECs andmethods for making them are described, e.g., in U.S. Pat. Nos.6,586,251; 6,596,541; and 7,105,348; and in WO 2002/036789, each ofwhich is herein incorporated by reference in its entirety for allpurposes. LTVECs can be in linear form or in circular form. LTVECs canbe of any length and are typically at least 10 kb in length. The size ofan LTVEC can be too large to enable screening of targeting events byconventional assays, e.g., southern blotting and long-range (e.g., 1 kbto 5 kb) PCR.

The targeting vectors (e.g., LTVECs) used in the methods disclosedherein can be of any length. For example, a targeting vector can be fromabout 10 kb to about 400 kb, from about 20 kb to about 400 kb, fromabout 20 kb to about 30 kb, from about 30 kb to about 40 kb, from about40 kb to about 50 kb, from about 50 kb to about 75 kb, from about 75 kbto about 100 kb, from about 100 kb to 125 kb, from about 125 kb to about150 kb, from about 150 kb to about 175 kb, about 175 kb to about 200 kb,from about 200 kb to about 225 kb, from about 225 kb to about 250 kb,from about 250 kb to about 275 kb or from about 275 kb to about 300 kb,from about 200 kb to about 300 kb, from about 300 kb to about 350 kb, orfrom about 350 kb to about 400 kb. In one example, a targeting vectorcan be at least about 100 kb or 100 kb in length. A targeting vector canalso be from about 50 kb to about 500 kb, from about 100 kb to about 125kb, from about 300 kb to about 325 kb, from about 325 kb to about 350kb, from about 350 kb to about 375 kb, from about 375 kb to about 400kb, from about 400 kb to about 425 kb, from about 425 kb to about 450kb, from about 450 kb to about 475 kb, or from about 475 kb to about 500kb. Alternatively, a targeting vector can be at least 10 kb, at least 15kb, at least 20 kb, at least 30 kb, at least 40 kb, at least 50 kb, atleast 60 kb, at least 70 kb, at least 80 kb, at least 90 kb, at least100 kb, at least 150 kb, at least 200 kb, at least 250 kb, at least 300kb, at least 350 kb, at least 400 kb, at least 450 kb, or at least 500kb or greater.

VII. Nuclease Agents

Any rare-cutting nuclease agent can be used in the methods disclosedherein. A rare-cutting nuclease agent is a nuclease agent with a targetsequence or recognition sequence that occurs rarely in a genome.Similarly, any nuclease agent with a target sequence or recognitionsequence that does not occur outside of the intended cleavage site(s) inthe targeting vectors described herein can be used. For example, anynuclease agent that does not have a target sequence or recognitionsequence in the preexisting targeting vectors in the methods describedherein can be used.

Any nuclease agent as described above that induces a nick ordouble-strand break at a desired target sequence can be used in themethods and compositions disclosed herein. A naturally occurring ornative nuclease agent can be employed so long as the nuclease agentinduces a nick or double-strand break in a desired target sequence.Alternatively, a modified or engineered nuclease agent can be employed.An “engineered nuclease agent” includes a nuclease that is engineered(modified or derived) from its native form to specifically recognize andinduce a nick or double-strand break in the desired target sequence.Thus, an engineered nuclease agent can be derived from a native,naturally occurring nuclease agent or it can be artificially created orsynthesized. The engineered nuclease can induce a nick or double-strandbreak in a target sequence, for example, wherein the target sequence isnot a sequence that would have been recognized by a native(non-engineered or non-modified) nuclease agent. The modification of thenuclease agent can be as little as one amino acid in a protein cleavageagent or one nucleotide in a nucleic acid cleavage agent. Producing anick or double-strand break in a target sequence or other DNA can bereferred to herein as “cutting” or “cleaving” the target sequence orother DNA.

Active variants and fragments of the exemplified target sequences arealso provided. Such active variants can comprise at least 65%, 70%, 75%,80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or moresequence identity to the given target sequence, wherein the activevariants retain biological activity and hence are capable of beingrecognized and cleaved by a nuclease agent in a sequence-specificmanner. Assays to measure the double-strand break of a target sequenceby a nuclease agent are well-known. See, e.g., Frendewey et al. (2010)Methods in Enzymology 476:295-307, which is incorporated by referenceherein in its entirety for all purposes.

Active variants and fragments of nuclease agents (i.e., an engineerednuclease agent) are also provided. Such active variants can comprise atleast 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,98%, 99% or more sequence identity to the native nuclease agent, whereinthe active variants retain the ability to cut at a desired targetsequence and hence retain nick or double-strand-break-inducing activity.For example, any of the nuclease agents described herein can be modifiedfrom a native endonuclease sequence and designed to recognize and inducea nick or double-strand break at a target sequence that was notrecognized by the native nuclease agent. Thus, some engineered nucleaseshave a specificity to induce a nick or double-strand break at a targetsequence that is different from the corresponding native nuclease agenttarget sequence. Assays for nick or double-strand-break-inducingactivity are known and generally measure the overall activity andspecificity of the endonuclease on DNA substrates containing the targetsequence.

A nuclease target sequence includes a DNA sequence at which a nick ordouble-strand break is induced by a nuclease agent. The length of thetarget sequence can vary, and includes, for example, target sequencesthat are about 30-36 bp for a zinc finger nuclease (ZFN) pair (i.e.,about 15-18 bp for each ZFN), about 36 bp for a TranscriptionActivator-Like Effector Nuclease (TALEN), or about 20 bp for aCRISPR/Cas9 guide RNA.

A. Restriction Enzymes

Nuclease agents suitable for use in the methods disclosed herein cancomprise restriction endonucleases, which include Type I, Type II, TypeIII, and Type IV endonucleases. Type I and Type III restrictionendonucleases recognize specific recognition sites, but typically cleaveat a variable position from the nuclease binding site, which can behundreds of base pairs away from the cleavage site (recognition site).In Type II systems the restriction activity is independent of anymethylase activity, and cleavage typically occurs at specific siteswithin or near to the binding site. Most Type II enzymes cut palindromicsequences, however Type IIa enzymes recognize non-palindromicrecognition sites and cleave outside of the recognition site, Type IIbenzymes cut sequences twice with both sites outside of the recognitionsite, and Type IIs enzymes recognize an asymmetric recognition site andcleave on one side and at a defined distance of about 1-20 nucleotidesfrom the recognition site. Type IV restriction enzymes target methylatedDNA. Restriction enzymes are further described and classified, forexample, in the REBASE database (webpage at rebase.neb.com; Roberts etal. (2003) Nucleic Acids Res. 31:418-20); Roberts et al. (2003) NucleicAcids Res. 31:1805-12; and Belfort et al. (2002) in Mobile DNA II, pp.761-783, Eds. Craigie et al. (ASM Press, Washington, D.C.), each ofwhich is herein incorporated by reference in its entirety.

In some methods, a rare-cutting restriction enzyme is used. Arare-cutting restriction enzyme refers to an enzyme with a target siteor recognition site that occurs only rarely in a genome. The size ofrestriction fragments generated by cutting a hypothetical random genomewith a restriction enzyme may be approximated by 4^(N), where N is thenumber of nucleotides in the recognition site of the enzyme. Forexample, an enzyme with a recognition site consisting of 7 nucleotideswould cut a genome once every 4⁷ bp, producing fragments of about 16,384bp. Generally, rare-cutter enzymes have recognition sites comprising 6or more nucleotides. For example, a rare cutter enzyme may have arecognition site comprising or consisting of 6, 7, 8, 9, 10, 11, 12, 13,14, or 15 nucleotides. Examples of rare-cutting restriction enzymesinclude NotI (GCGGCCGC), XmaIII (CGGCCG), SstII (CCGCGG), Sall (GTCGAC),NruI (TCGCGA), NheI (GCTAGC), Nb.BbvCI (CCTCAGC), BbvCI (CCTCAGC), AscI(GGCGCGCC), AsiSI (GCGATCGC), FseI (GGCCGGCC), PacI (TTAATTAA), PmeI(GTTTAAAC), SbfI (CCTGCAGG), SgrAI (CRCCGGYG), SwaI (ATTTAAAT), BspQI(GCTCTTC), SapI (GCTCTTC), SfiI (GGCCNNNNNGGCC), CspCI (CAANNNNNGTGG),AbsI (CCTCGAGG), CciNI (GCGGCCGC), FspAI (RTGCGCAY), MauBI (CGCGCGCG),MreI (CGCCGGCG), MssI (GTTTAAAC), PalAI (GGCGCGCC), RgaI (GCGATCGC),RigI (GGCCGGCC), SdaI (CCTGCAGG), SfaAI (GCGATCGC), SgfI (GCGATCGC),SgrDI (CGTCGACG), SgsI (GGCGCGCC), SmiI (ATTTAAAT), SrfI (GCCCGGGC),Sse2321 (CGCCGGCG), Sse83871 (CCTGCAGG), LguI (GCTCTTC), PciSI(GCTCTTC), AarI (CACCTGC), AjuI (GAANNNNNNNTTGG), AloI (GAACNNNNNNTCC),Bad (GAAGNNNNNNTAC), PpiI (GAACNNNNNCTC), PsrI (GAACNNNNNNTAC), andothers.

B. CRISPR/Cas Systems

Clustered Regularly Interspersed Short Palindromic Repeats(CRISPR)/CRISPR-associated (Cas) systems can also be used as therare-cutting nuclease agents in the methods disclosed herein. CRISPR/Cassystems include transcripts and other elements involved in theexpression of, or directing the activity of, Cas genes. A CRISPR/Cassystem can be, for example, a type I, a type II, a type III system, or atype V system (e.g., subtype V-A or subtype V-B). CRISPR/Cas systemsused in the compositions and methods disclosed herein can benon-naturally occurring. A “non-naturally occurring” system includesanything indicating the involvement of the hand of man, such as one ormore components of the system being altered or mutated from theirnaturally occurring state, being at least substantially free from atleast one other component with which they are naturally associated innature, or being associated with at least one other component with whichthey are not naturally associated. For example, some CRISPR/Cas systemsemploy non-naturally occurring CRISPR complexes comprising a gRNA and aCas protein that do not naturally occur together, employ a Cas proteinthat does not occur naturally, or employ a gRNA that does not occurnaturally.

Cas Proteins and Polynucleotides Encoding Cas Proteins.

Cas proteins generally comprise at least one RNA recognition or bindingdomain that can interact with guide RNAs (gRNAs). Cas proteins can alsocomprise nuclease domains (e.g., DNase domains or RNase domains),DNA-binding domains, helicase domains, protein-protein interactiondomains, dimerization domains, and other domains. Some such domains(e.g., DNase domains) can be from a native Cas protein. Other suchdomains can be added to make a modified Cas protein. A nuclease domainpossesses catalytic activity for nucleic acid cleavage, which includesthe breakage of the covalent bonds of a nucleic acid molecule. Cleavagecan produce blunt ends or staggered ends, and it can be single-strandedor double-stranded. For example, a wild type Cas9 protein will typicallycreate a blunt cleavage product. Alternatively, a wild type Cpf1 protein(e.g., FnCpf1) can result in a cleavage product with a 5-nucleotide 5′overhang, with the cleavage occurring after the 18th base pair from thePAM sequence on the non-targeted strand and after the 23rd base on thetargeted strand. A Cas protein can have full cleavage activity to createa double-strand break at a target genomic locus (e.g., a double-strandbreak with blunt ends), or it can be a nickase that creates asingle-strand break at a target genomic locus.

Examples of Cas proteins include Cas1, Cas1B, Cas2, Cas3, Cas4, Cas5,Cas5e (CasD), Cas6, Cas6e, Cas6f, Cas7, Cas8a1, Cas8a2, Cas8b, Cas8c,Cas9 (Csn1 or Csx12), Cas10, Cas10d, CasF, CasG, CasH, Csy1, Csy2, Csy3,Cse1 (CasA), Cse2 (CasB), Cse3 (CasE), Cse4 (CasC), Csc1, Csc2, Csa5,Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1,Csb2, Csb3, Csx17, Csx14, Csx10, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1,Csf2, Csf3, Csf4, and Cu1966, and homologs or modified versions thereof.

An exemplary Cas protein is a Cas9 protein or a protein derived from aCas9 protein. Cas9 proteins are from a type II CRISPR/Cas system andtypically share four key motifs with a conserved architecture. Motifs 1,2, and 4 are RuvC-like motifs, and motif 3 is an HNH motif. ExemplaryCas9 proteins are from Streptococcus pyogenes, Streptococcusthermophilus, Streptococcus sp., Staphylococcus aureus, Nocardiopsisdassonvillei, Streptomyces pristinaespiralis, Streptomycesviridochromogenes, Streptomyces viridochromogenes, Streptosporangiumroseum, Streptosporangium roseum, Alicyclobacillus acidocaldarius,Bacillus pseudomycoides, Bacillus selenitireducens, Exiguobacteriumsibiricum, Lactobacillus delbrueckii, Lactobacillus salivarius,Microscilla marina, Burkholderiales bacterium, Polaromonasnaphthalenivorans, Polaromonas sp., Crocosphaera watsonii, Cyanothecesp., Microcystis aeruginosa, Synechococcus sp., Acetohalobiumarabaticum, Ammonifex degensii, Caldicelulosiruptor becscii, CandidatusDesulforudis, Clostridium botulinum, Clostridium difficile, Finegoldiamagna, Natranaerobius thermophilus, Pelotomaculum thermopropionicum,Acidithiobacillus caldus, Acidithiobacillus ferrooxidans, Allochromatiumvinosum, Marinobacter sp Nitrosococcus halophilus, Nitrosococcuswatsoni, Pseudoalteromonas haloplanktis, Ktedonobacter racemifer,Methanohalobium evestigatum, Anabaena variabilis, Nodularia spumigena,Nostoc sp Arthrospira maxima, Arthrospira platensis, Arthrospira sp.,Lyngbya sp., Microcoleus chthonoplastes, Oscillatoria sp., Petrotogamobilis, Thermosipho africanus, Acaryochloris marina, Neisseriameningitidis, or Campylobacter jejuni. Additional examples of the Cas9family members are described in WO 2014/131833, herein incorporated byreference in its entirety for all purposes. Cas9 from S. pyogenes(SpCas9) (assigned SwissProt accession number Q99ZW2) is an exemplaryCas9 protein. Cas9 from S. aureus (SaCas9) (assigned UniProt accessionnumber J7RUA5) is another exemplary Cas9 protein. Cas9 fromCampylobacter jejuni (CjCas9) (assigned UniProt accession number Q0P897)is another exemplary Cas9 protein. See, e.g., Kim et al. (2017) Nat.Commun. 8:14500, herein incorporated by reference in its entirety forall purposes. SaCas9 is smaller than SpCas9, and CjCas9 is smaller thanboth SaCas9 and SpCas9. Cas9 from Neisseria meningitidis (Nme2Cas9) isanother exemplary Cas9 protein. See, e.g., Edraki et al. (2019) Mol.Cell 73(4):714-726, herein incorporated by reference in its entirety forall purposes. Cas9 proteins from Streptococcus thermophilus (e.g.,Streptococcus thermophilus LMD-9 Cas9 encoded by the CRISPR1 locus(St1Cas9) or Streptococcus thermophilus Cas9 from the CRISPR3 locus(St3Cas9)) are other exemplary Cas9 proteins. Cas9 from Francisellanovicida (FnCas9) or the RHA Francisella novicida Cas9 variant thatrecognizes an alternative PAM (E1369R/E1449H/R1556A substitutions) areother exemplary Cas9 proteins. These and other exemplary Cas9 proteinsare reviewed, e.g., in Cebrian-Serrano and Davies (2017) Mamm. Genome28(7):247-261, herein incorporated by reference in its entirety for allpurposes. An exemplary Cas9 protein sequence can comprise, consistessentially of, or consist of SEQ ID NO: 1. An exemplary DNA encodingthe Cas9 protein can comprise, consist essentially of, or consist of SEQID NO: 2.

Another example of a Cas protein is a Cpf1 (CRISPR from Prevotella andFrancisella 1) protein. Cpf1 is a large protein (about 1300 amino acids)that contains a RuvC-like nuclease domain homologous to thecorresponding domain of Cas9 along with a counterpart to thecharacteristic arginine-rich cluster of Cas9. However, Cpf1 lacks theHNH nuclease domain that is present in Cas9 proteins, and the RuvC-likedomain is contiguous in the Cpf1 sequence, in contrast to Cas9 where itcontains long inserts including the HNH domain. See, e.g., Zetsche etal. (2015) Cell 163(3):759-771, herein incorporated by reference in itsentirety for all purposes. Exemplary Cpf1 proteins are from Francisellatularensis 1, Francisella tularensis subsp. novicida, Prevotellaalbensis, Lachnospiraceae bacterium MC2017 1, Butyrivibrioproteoclasticus, Peregrinibacteria bacterium GW2011_GWA2_33_10,Parcubacteria bacterium GW2011_GWC2_44_17, Smithella sp. SCADC,Acidaminococcus sp. BV3L6, Lachnospiraceae bacterium MA2020, CandidatusMethanoplasma termitum, Eubacterium eligens, Moraxella bovoculi 237,Leptospira inadai, Lachnospiraceae bacterium ND2006, Porphyromonascrevioricanis 3, Prevotella disiens, and Porphyromonas macacae. Cpf1from Francisella novicida U112 (FnCpf1; assigned UniProt accessionnumber A0Q7Q2) is an exemplary Cpf1 protein.

Cas proteins can be wild type proteins (i.e., those that occur innature), modified Cas proteins (i.e., Cas protein variants), orfragments of wild type or modified Cas proteins. Cas proteins can alsobe active variants or fragments with respect to catalytic activity ofwild type or modified Cas proteins. Active variants or fragments withrespect to catalytic activity can comprise at least 80%, 85%, 90%, 91%,92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to thewild type or modified Cas protein or a portion thereof, wherein theactive variants retain the ability to cut at a desired cleavage site andhence retain nick-inducing or double-strand-break-inducing activity.Assays for nick-inducing or double-strand-break-inducing activity areknown and generally measure the overall activity and specificity of theCas protein on DNA substrates containing the cleavage site.

Cas proteins can be modified to increase or decrease one or more ofnucleic acid binding affinity, nucleic acid binding specificity, andenzymatic activity. Cas proteins can also be modified to change anyother activity or property of the protein, such as stability. Forexample, one or more nuclease domains of the Cas protein can bemodified, deleted, or inactivated, or a Cas protein can be truncated toremove domains that are not essential for the function of the protein orto optimize (e.g., enhance or reduce) the activity or a property of theCas protein.

One example of a modified Cas protein is the modified SpCas9-HF1protein, which is a high-fidelity variant of Streptococcus pyogenes Cas9harboring alterations (N497A/R661A/Q695A/Q926A) designed to reducenon-specific DNA contacts. See, e.g., Kleinstiver et al. (2016) Nature529(7587):490-495, herein incorporated by reference in its entirety forall purposes. Another example of a modified Cas protein is the modifiedeSpCas9 variant (K848A/K1003A/R1060A) designed to reduce off-targeteffects. See, e.g., Slaymaker et al. (2016) Science 351(6268):84-88,herein incorporated by reference in its entirety for all purposes. OtherSpCas9 variants include K855A and K810A/K1003A/R1060A. These and othermodified Cas proteins are reviewed, e.g., in Cebrian-Serrano and Davies(2017) Mamm. Genome 28(7):247-261, herein incorporated by reference inits entirety for all purposes. Another example of a modified Cas9protein is xCas9, which is a SpCas9 variant that can recognize anexpanded range of PAM sequences. See, e.g., Hu et al. (2018) Nature556:57-63, herein incorporated by reference in its entirety for allpurposes.

Cas proteins can comprise at least one nuclease domain, such as a DNasedomain. For example, a wild type Cpf1 protein generally comprises aRuvC-like domain that cleaves both strands of target DNA, perhaps in adimeric configuration. Cas proteins can also comprise at least twonuclease domains, such as DNase domains. For example, a wild type Cas9protein generally comprises a RuvC-like nuclease domain and an HNH-likenuclease domain. The RuvC and HNH domains can each cut a differentstrand of double-stranded DNA to make a double-stranded break in theDNA. See, e.g., Jinek et al. (2012) Science 337:816-821, hereinincorporated by reference in its entirety for all purposes.

One or more of the nuclease domains can be deleted or mutated so thatthey are no longer functional or have reduced nuclease activity. Forexample, if one of the nuclease domains is deleted or mutated in a Cas9protein, the resulting Cas9 protein can be referred to as a nickase andcan generate a single-strand break within a double-stranded target DNAbut not a double-strand break (i.e., it can cleave the complementarystrand or the non-complementary strand, but not both). An example of amutation that converts Cas9 into a nickase is a D10A (aspartate toalanine at position 10 of Cas9) mutation in the RuvC domain of Cas9 fromS. pyogenes. Likewise, H939A (histidine to alanine at amino acidposition 839), H840A (histidine to alanine at amino acid position 840),or N863A (asparagine to alanine at amino acid position N863) in the HNHdomain of Cas9 from S. pyogenes can convert the Cas9 into a nickase.Other examples of mutations that convert Cas9 into a nickase include thecorresponding mutations to Cas9 from S. thermophilus. See, e.g.,Sapranauskas et al. (2011) Nucleic Acids Res. 39(21):9275-9282 and WO2013/141680, each of which is herein incorporated by reference in itsentirety for all purposes. Such mutations can be generated using methodssuch as site-directed mutagenesis, PCR-mediated mutagenesis, or totalgene synthesis. Examples of other mutations creating nickases can befound, for example, in WO 2013/176772 and WO 2013/142578, each of whichis herein incorporated by reference in its entirety for all purposes.

Examples of inactivating mutations in the catalytic domains of xCas9 arethe same as those described above for SpCas9. Examples of inactivatingmutations in the catalytic domains of Staphylococcus aureus Cas9proteins are also known. For example, the Staphyloccocus aureus Cas9enzyme (SaCas9) may comprise a substitution at position N580 (e.g.,N580A substitution) to create a nickase. Alternatively, the SaCas9enzyme may comprise a substitution at position D10 (e.g., D10Asubstitution) to generate a nickase. See, e.g., WO 2016/106236, hereinincorporated by reference in its entirety for all purposes. Examples ofinactivating mutations in the catalytic domains of Nme2Cas9 are alsoknown (e.g., combination of D16A and H588A). Examples of inactivatingmutations in the catalytic domains of St1Cas9 are also known (e.g.,combination of D9A, D598A, H599A, and N622A). Examples of inactivatingmutations in the catalytic domains of St3Cas9 are also known (e.g.,combination of D10A and N870A). Examples of inactivating mutations inthe catalytic domains of CjCas9 are also known (e.g., combination of D8Aand H559A). Examples of inactivating mutations in the catalytic domainsof FnCas9 and RHA FnCas9 are also known (e.g., N995A).

Examples of inactivating mutations in the catalytic domains of Cpf1proteins are also known. With reference to Cpf1 proteins fromFrancisella novicida U112 (FnCpf1), Acidaminococcus sp. BV3L6 (AsCpf1),Lachnospiraceae bacterium ND2006 (LbCpf1), and Moraxella bovoculi 237(MbCpf1 Cpf1), such mutations can include mutations at positions 908,993, or 1263 of AsCpf1 or corresponding positions in Cpf1 orthologs, orpositions 832, 925, 947, or 1180 of LbCpf1 or corresponding positions inCpf1 orthologs. Such mutations can include, for example one or more ofmutations D908A, E993A, and D1263A of AsCpf1 or corresponding mutationsin Cpf1 orthologs, or D832A, E925A, D947A, and D1180A of LbCpf1 orcorresponding mutations in Cpf1 orthologs. See, e.g., US 2016/0208243,herein incorporated by reference in its entirety for all purposes.

Cas proteins can also be operably linked to heterologous polypeptides asfusion proteins. For example, a Cas protein can be fused to a cleavagedomain. See WO 2014/089290, herein incorporated by reference in itsentirety for all purposes. Cas proteins can also be fused to aheterologous polypeptide providing increased or decreased stability. Thefused domain or heterologous polypeptide can be located at theN-terminus, the C-terminus, or internally within the Cas protein.

As one example, a Cas protein can be fused to one or more heterologouspolypeptides that provide for subcellular localization. Suchheterologous polypeptides can include, for example, one or more nuclearlocalization signals (NLS) such as the monopartite SV40 NLS and/or abipartite alpha-importin NLS for targeting to the nucleus, amitochondrial localization signal for targeting to the mitochondria, anER retention signal, and the like. See, e.g., Lange et al. (2007) J.Biol. Chem. 282(8):5101-5105, herein incorporated by reference in itsentirety for all purposes. Such subcellular localization signals can belocated at the N-terminus, the C-terminus, or anywhere within the Casprotein. An NLS can comprise a stretch of basic amino acids, and can bea monopartite sequence or a bipartite sequence. Optionally, a Casprotein can comprise two or more NLSs, including an NLS (e.g., analpha-importin NLS or a monopartite NLS) at the N-terminus and an NLS(e.g., an SV40 NLS or a bipartite NLS) at the C-terminus. A Cas proteincan also comprise two or more NLSs at the N-terminus and/or two or moreNLSs at the C-terminus.

Cas proteins can also be operably linked to a cell-penetrating domain orprotein transduction domain. For example, the cell-penetrating domaincan be derived from the HIV-1 TAT protein, the TLM cell-penetratingmotif from human hepatitis B virus, MPG, Pep-1, VP22, a cell penetratingpeptide from Herpes simplex virus, or a polyarginine peptide sequence.See, e.g., WO 2014/089290 and WO 2013/176772, each of which is hereinincorporated by reference in its entirety for all purposes. Thecell-penetrating domain can be located at the N-terminus, theC-terminus, or anywhere within the Cas protein.

Cas proteins provided as mRNAs can be modified for improved stabilityand/or immunogenicity properties. The modifications may be made to oneor more nucleosides within the mRNA. Examples of chemical modificationsto mRNA nucleobases include pseudouridine, 1-methyl-pseudouridine, and5-methyl-cytidine. For example, capped and polyadenylated Cas mRNAcontaining N1-methyl pseudouridine can be used. Likewise, Cas mRNAs canbe modified by depletion of uridine using synonymous codons.

Guide RNAs.

A “guide RNA” or “gRNA” is an RNA molecule that binds to a Cas protein(e.g., Cas9 protein) and targets the Cas protein to a specific locationwithin a target DNA. Guide RNAs can comprise two segments: a“DNA-targeting segment” and a “protein-binding segment.” “Segment”includes a section or region of a molecule, such as a contiguous stretchof nucleotides in an RNA. Some gRNAs, such as those for Cas9, cancomprise two separate RNA molecules: an “activator-RNA” (e.g., tracrRNA)and a “targeter-RNA” (e.g., CRISPR RNA or crRNA). Other gRNAs are asingle RNA molecule (single RNA polynucleotide), which can also becalled a “single-molecule gRNA,” a “single-guide RNA,” or an “sgRNA.”See, e.g., WO 2013/176772, WO 2014/065596, WO 2014/089290, WO2014/093622, WO 2014/099750, WO 2013/142578, and WO 2014/131833, each ofwhich is herein incorporated by reference in its entirety for allpurposes. For Cas9, for example, a single-guide RNA can comprise a crRNAfused to a tracrRNA (e.g., via a linker). For Cpf1, for example, only acrRNA is needed to achieve binding to and/or cleavage of a targetsequence. The terms “guide RNA” and “gRNA” include both double-molecule(i.e., modular) gRNAs and single-molecule gRNAs.

An exemplary two-molecule gRNA comprises a crRNA-like (“CRISPR RNA” or“targeter-RNA” or “crRNA” or “crRNA repeat”) molecule and acorresponding tracrRNA-like (“trans-acting CRISPR RNA” or“activator-RNA” or “tracrRNA”) molecule. A crRNA comprises both theDNA-targeting segment (single-stranded) of the gRNA and a stretch ofnucleotides (i.e., the crRNA tail) that forms one half of the dsRNAduplex of the protein-binding segment of the gRNA. An example of a crRNAtail, located downstream (3′) of the DNA-targeting segment, comprises,consists essentially of, or consists of GUUUUAGAGCUAUGCU (SEQ ID NO: 3).Any of the DNA-targeting segments disclosed herein can be joined to the5′ end of SEQ ID NO: 3 to form a crRNA.

A corresponding tracrRNA (activator-RNA) comprises a stretch ofnucleotides that forms the other half of the dsRNA duplex of theprotein-binding segment of the gRNA. A stretch of nucleotides of a crRNAare complementary to and hybridize with a stretch of nucleotides of atracrRNA to form the dsRNA duplex of the protein-binding domain of thegRNA. As such, each crRNA can be said to have a corresponding tracrRNA.An example of a tracrRNA sequence comprises, consists essentially of, orconsists of AGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUU (SEQ ID NO: 4). Other examples of tracrRNA sequencescomprise, consist essentially of, or consist of

(SEQ ID NO: 12) AAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU or (SEQ ID NO: 13)GUUGGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC.

In systems in which both a crRNA and a tracrRNA are needed, the crRNAand the corresponding tracrRNA hybridize to form a gRNA. In systems inwhich only a crRNA is needed, the crRNA can be the gRNA. The crRNAadditionally provides the single-stranded DNA-targeting segment thathybridizes to the complementary strand of a target DNA. If used formodification within a cell, the exact sequence of a given crRNA ortracrRNA molecule can be designed to be specific to the species in whichthe RNA molecules will be used. See, e.g., Mali et al. (2013) Science339(6121):823-826; Jinek et al. (2012) Science 337(6096):816-821; Hwanget al. (2013) Nat. Biotechnol. 31(3):227-229; Jiang et al. (2013) Nat.Biotechnol. 31(3):233-239; and Cong et al. (2013) Science339(6121):819-823, each of which is herein incorporated by reference inits entirety for all purposes.

The DNA-targeting segment (crRNA) of a given gRNA comprises a nucleotidesequence that is complementary to a sequence on the complementary strandof the target DNA, as described in more detail below. The DNA-targetingsegment of a gRNA interacts with the target DNA in a sequence-specificmanner via hybridization (i.e., base pairing). As such, the nucleotidesequence of the DNA-targeting segment may vary and determines thelocation within the target DNA with which the gRNA and the target DNAwill interact. The DNA-targeting segment of a subject gRNA can bemodified to hybridize to any desired sequence within a target DNA.Naturally occurring crRNAs differ depending on the CRISPR/Cas system andorganism but often contain a targeting segment of between 21 to 72nucleotides length, flanked by two direct repeats (DR) of a length ofbetween 21 to 46 nucleotides (see, e.g., WO 2014/131833, hereinincorporated by reference in its entirety for all purposes). In the caseof S. pyogenes, the DRs are 36 nucleotides long and the targetingsegment is 30 nucleotides long. The 3′ located DR is complementary toand hybridizes with the corresponding tracrRNA, which in turn binds tothe Cas protein.

The DNA-targeting segment can have, for example, a length of at leastabout 12, 15, 17, 18, 19, 20, 25, 30, 35, or 40 nucleotides. SuchDNA-targeting segments can have, for example, a length from about 12 toabout 100, from about 12 to about 80, from about 12 to about 50, fromabout 12 to about 40, from about 12 to about 30, from about 12 to about25, or from about 12 to about 20 nucleotides. For example, the DNAtargeting segment can be from about 15 to about 25 nucleotides (e.g.,from about 17 to about 20 nucleotides, or about 17, 18, 19, or 20nucleotides). See, e.g., US 2016/0024523, herein incorporated byreference in its entirety for all purposes. For Cas9 from S. pyogenes, atypical DNA-targeting segment is between 16 and 20 nucleotides in lengthor between 17 and 20 nucleotides in length. For Cas9 from S. aureus, atypical DNA-targeting segment is between 21 and 23 nucleotides inlength. For Cpf1, a typical DNA-targeting segment is at least 16nucleotides in length or at least 18 nucleotides in length.

TracrRNAs can be in any form (e.g., full-length tracrRNAs or activepartial tracrRNAs) and of varying lengths. They can include primarytranscripts or processed forms. For example, tracrRNAs (as part of asingle-guide RNA or as a separate molecule as part of a two-moleculegRNA) may comprise, consist essentially of, or consist of all or aportion of a wild type tracrRNA sequence (e.g., about or more than about20, 26, 32, 45, 48, 54, 63, 67, 85, or more nucleotides of a wild typetracrRNA sequence). Examples of wild type tracrRNA sequences from S.pyogenes include 171-nucleotide, 89-nucleotide, 75-nucleotide, and65-nucleotide versions. See, e.g., Deltcheva et al. (2011) Nature471(7340):602-607; WO 2014/093661, each of which is herein incorporatedby reference in its entirety for all purposes. Examples of tracrRNAswithin single-guide RNAs (sgRNAs) include the tracrRNA segments foundwithin +48, +54, +67, and +85 versions of sgRNAs, where “+n” indicatesthat up to the +n nucleotide of wild type tracrRNA is included in thesgRNA. See U.S. Pat. No. 8,697,359, herein incorporated by reference inits entirety for all purposes.

The percent complementarity between the DNA-targeting segment of theguide RNA and the complementary strand of the target DNA can be at least60% (e.g., at least 65%, at least 70%, at least 75%, at least 80%, atleast 85%, at least 90%, at least 95%, at least 97%, at least 98%, atleast 99%, or 100%). The percent complementarity between theDNA-targeting segment and the complementary strand of the target DNA canbe at least 60% over about 20 contiguous nucleotides. As an example, thepercent complementarity between the DNA-targeting segment and thecomplementary strand of the target DNA can be 100% over the 14contiguous nucleotides at the 5′ end of the complementary strand of thetarget DNA and as low as 0% over the remainder. In such a case, theDNA-targeting segment can be considered to be 14 nucleotides in length.As another example, the percent complementarity between theDNA-targeting segment and the complementary strand of the target DNA canbe 100% over the seven contiguous nucleotides at the 5′ end of thecomplementary strand of the target DNA and as low as 0% over theremainder. In such a case, the DNA-targeting segment can be consideredto be 7 nucleotides in length. In some guide RNAs, at least 17nucleotides within the DNA-targeting segment are complementary to thecomplementary strand of the target DNA. For example, the DNA-targetingsegment can be 20 nucleotides in length and can comprise 1, 2, or 3mismatches with the complementary strand of the target DNA. In oneexample, the mismatches are not adjacent to the region of thecomplementary strand corresponding to the protospacer adjacent motif(PAM) sequence (i.e., the reverse complement of the PAM sequence) (e.g.,the mismatches are in the 5′ end of the DNA-targeting segment of theguide RNA, or the mismatches are at least 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16, 17, 18, or 19 base pairs away from the region ofthe complementary strand corresponding to the PAM sequence).

The protein-binding segment of a gRNA can comprise two stretches ofnucleotides that are complementary to one another. The complementarynucleotides of the protein-binding segment hybridize to form adouble-stranded RNA duplex (dsRNA). The protein-binding segment of asubject gRNA interacts with a Cas protein, and the gRNA directs thebound Cas protein to a specific nucleotide sequence within target DNAvia the DNA-targeting segment.

Single-guide RNAs can comprise a DNA-targeting segment joined to ascaffold sequence (i.e., the protein-binding or Cas-binding sequence ofthe guide RNA). For example, such guide RNAs can have a 5′ DNA-targetingsegment and a 3′ scaffold sequence. Exemplary scaffold sequencescomprise, consist essentially of, or consist of:GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCU (version 1; SEQ ID NO: 5);GUUGGAACCAUUCAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC (version 2; SEQ ID NO: 6);GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC (version 3; SEQ ID NO: 7); andGUUUAAGAGCUAUGCUGGAAACAGCAUAGCAAGUUUAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGC (version 4; SEQ ID NO: 8). Otherexemplary scaffold sequences comprise, consist essentially of, orconsist of:

(version 5; SEQ ID NO: 14)GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUU; (version 6; SEQ ID NO: 15)GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUU; or (version 7; SEQ ID NO: 16)GUUUAAGAGCUAUGCUGGAAACAGCAUAGCAAGUUUAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUC.Guide RNAs targeting any guide RNA target sequence can include, forexample, a DNA-targeting segment on the 5′ end of the guide RNA fused toany of the exemplary guide RNA scaffold sequences on the 3′ end of theguide RNA. That is, a DNA-targeting segment can be joined to the 5′ endof any one of SEQ ID NOS: 5-8 to form a single guide RNA (chimeric guideRNA). Likewise, a DNA-targeting segment can be joined to the 5′ end ofany one of SEQ ID NOS: 14-16 to form a single guide RNA (chimeric guideRNA). Guide RNA versions 1, 2, 3, and 4 as disclosed elsewhere hereinrefer to DNA-targeting segments (i.e., guide sequences or guides) joinedwith scaffold versions 1, 2, 3, and 4, respectively. Guide RNA versions5, 6, and 7 as disclosed elsewhere herein refer to DNA-targetingsegments (i.e., guide sequences or guides) joined with scaffold versions5, 6, and 7, respectively.

Guide RNAs can include modifications or sequences that provide foradditional desirable features (e.g., modified or regulated stability;subcellular targeting; tracking with a fluorescent label; a binding sitefor a protein or protein complex; and the like). Examples of suchmodifications include, for example, a 5′ cap (e.g., a 7-methylguanylatecap (m7G)); a 3′ polyadenylated tail (i.e., a 3′ poly(A) tail); ariboswitch sequence (e.g., to allow for regulated stability and/orregulated accessibility by proteins and/or protein complexes); astability control sequence; a sequence that forms a dsRNA duplex (i.e.,a hairpin); a modification or sequence that targets the RNA to asubcellular location (e.g., nucleus, mitochondria, chloroplasts, and thelike); a modification or sequence that provides for tracking (e.g.,direct conjugation to a fluorescent molecule, conjugation to a moietythat facilitates fluorescent detection, a sequence that allows forfluorescent detection, and so forth); a modification or sequence thatprovides a binding site for proteins (e.g., proteins that act on DNA,including DNA methyltransferases, DNA demethylases, histoneacetyltransferases, histone deacetylases, and the like); andcombinations thereof. Other examples of modifications include engineeredstem loop duplex structures, engineered bulge regions, engineeredhairpins 3′ of the stem loop duplex structure, or any combinationthereof. See, e.g., US 2015/0376586, herein incorporated by reference inits entirety for all purposes. A bulge can be an unpaired region ofnucleotides within the duplex made up of the crRNA-like region and theminimum tracrRNA-like region. A bulge can comprise, on one side of theduplex, an unpaired 5′-XXXY-3′ where X is any purine and Y can be anucleotide that can form a wobble pair with a nucleotide on the oppositestrand, and an unpaired nucleotide region on the other side of theduplex.

Unmodified nucleic acids can be prone to degradation. Exogenous nucleicacids can also induce an innate immune response. Modifications can helpintroduce stability and reduce immunogenicity. Guide RNAs can comprisemodified nucleosides and modified nucleotides including, for example,one or more of the following: (1) alteration or replacement of one orboth of the non-linking phosphate oxygens and/or of one or more of thelinking phosphate oxygens in the phosphodiester backbone linkage; (2)alteration or replacement of a constituent of the ribose sugar such asalteration or replacement of the 2′ hydroxyl on the ribose sugar; (3)replacement of the phosphate moiety with dephospho linkers; (4)modification or replacement of a naturally occurring nucleobase; (5)replacement or modification of the ribose-phosphate backbone; (6)modification of the 3′ end or 5′ end of the oligonucleotide (e.g.,removal, modification or replacement of a terminal phosphate group orconjugation of a moiety); and (7) modification of the sugar. Otherpossible guide RNA modifications include modifications of or replacementof uracils or poly-uracil tracts. See, e.g., WO 2015/048577 and US2016/0237455, each of which is herein incorporated by reference in itsentirety for all purposes. Similar modifications can be made toCas-encoding nucleic acids, such as Cas mRNAs.

As one example, nucleotides at the 5′ or 3′ end of a guide RNA caninclude phosphorothioate linkages (e.g., the bases can have a modifiedphosphate group that is a phosphorothioate group). For example, a guideRNA can include phosphorothioate linkages between the 2, 3, or 4terminal nucleotides at the 5′ or 3′ end of the guide RNA. As anotherexample, nucleotides at the 5′ and/or 3′ end of a guide RNA can have2′-O-methyl modifications. For example, a guide RNA can include2′-O-methyl modifications at the 2, 3, or 4 terminal nucleotides at the5′ and/or 3′ end of the guide RNA (e.g., the 5′ end). See, e.g., WO2017/173054 A1 and Finn et al. (2018) Cell Rep. 22(9):2227-2235, each ofwhich is herein incorporated by reference in its entirety for allpurposes. In one specific example, the guide RNA comprises 2′-O-methylanalogs and 3′ phosphorothioate internucleotide linkages at the firstthree 5′ and 3′ terminal RNA residues. In another specific example, theguide RNA is modified such that all 2′OH groups that do not interactwith the Cas9 protein are replaced with 2′-O-methyl analogs, and thetail region of the guide RNA, which has minimal interaction with Cas9,is modified with 5′ and 3′ phosphorothioate internucleotide linkages.See, e.g., Yin et al. (2017) Nat. Biotech. 35(12):1179-1187, hereinincorporated by reference in its entirety for all purposes. Otherexamples of modified guide RNAs are provided, e.g., in WO 2018/107028A1, herein incorporated by reference in its entirety for all purposes.

gRNAs can be prepared by various other methods. For example, gRNAs canbe prepared by in vitro transcription using, for example, T7 RNApolymerase (see, e.g., WO 2014/089290 and WO 2014/065596, each of whichis herein incorporated by reference in its entirety for all purposes).Guide RNAs can also be a synthetically produced molecule prepared bychemical synthesis. For example, a guide RNA can be chemicallysynthesized to include 2′-O-methyl analogs and 3′ phosphorothioateinternucleotide linkages at the first three 5′ and 3′ terminal RNAresidues.

Guide RNA Target Sequences.

Target DNAs for guide RNAs include nucleic acid sequences present in aDNA to which a DNA-targeting segment of a gRNA will bind, providedsufficient conditions for binding exist. Suitable DNA/RNA bindingconditions include physiological conditions normally present in a cell.Other suitable DNA/RNA binding conditions (e.g., conditions in acell-free system) are known in the art (see, e.g., Molecular Cloning: ALaboratory Manual, 3rd Ed. (Sambrook et al., Harbor Laboratory Press2001), herein incorporated by reference in its entirety for allpurposes). The strand of the target DNA that is complementary to andhybridizes with the gRNA can be called the “complementary strand,” andthe strand of the target DNA that is complementary to the “complementarystrand” (and is therefore not complementary to the Cas protein or gRNA)can be called “noncomplementary strand” or “template strand.”

The target DNA includes both the sequence on the complementary strand towhich the guide RNA hybridizes and the corresponding sequence on thenon-complementary strand (e.g., adjacent to the protospacer adjacentmotif (PAM)). The term “guide RNA target sequence” as used herein refersspecifically to the sequence on the non-complementary strandcorresponding to (i.e., the reverse complement of) the sequence to whichthe guide RNA hybridizes on the complementary strand. That is, the guideRNA target sequence refers to the sequence on the non-complementarystrand adjacent to the PAM (e.g., upstream or 5′ of the PAM in the caseof Cas9). A guide RNA target sequence is equivalent to the DNA-targetingsegment of a guide RNA, but with thymines instead of uracils. As oneexample, a guide RNA target sequence for an SpCas9 enzyme can refer tothe sequence upstream of the 5′-NGG-3′ PAM on the non-complementarystrand. A guide RNA is designed to have complementarity to thecomplementary strand of a target DNA, where hybridization between theDNA-targeting segment of the guide RNA and the complementary strand ofthe target DNA promotes the formation of a CRISPR complex. Fullcomplementarity is not necessarily required, provided that there issufficient complementarity to cause hybridization and promote formationof a CRISPR complex. If a guide RNA is referred to herein as targeting aguide RNA target sequence, what is meant is that the guide RNAhybridizes to the complementary strand sequence of the target DNA thatis the reverse complement of the guide RNA target sequence on thenon-complementary strand.

A target DNA or guide RNA target sequence can comprise anypolynucleotide, and can be located, for example, in the nucleus orcytoplasm of a cell or within an organelle of a cell, such as amitochondrion or chloroplast. A target DNA or guide RNA target sequencecan be any nucleic acid sequence endogenous or exogenous to a cell. Theguide RNA target sequence can be a sequence coding a gene product (e.g.,a protein) or a non-coding sequence (e.g., a regulatory sequence) or caninclude both.

Site-specific binding and cleavage of a target DNA by a Cas protein canoccur at locations determined by both (i) base-pairing complementaritybetween the guide RNA and the complementary strand of the target DNA and(ii) a short motif, called the protospacer adjacent motif (PAM), in thenon-complementary strand of the target DNA. The PAM can flank the guideRNA target sequence. Optionally, the guide RNA target sequence can beflanked on the 3′ end by the PAM (e.g., for Cas9). Alternatively, theguide RNA target sequence can be flanked on the 5′ end by the PAM (e.g.,for Cpf1). For example, the cleavage site of Cas proteins can be about 1to about 10 or about 2 to about 5 base pairs (e.g., 3 base pairs)upstream or downstream of the PAM sequence (e.g., within the guide RNAtarget sequence). In the case of SpCas9, the PAM sequence (i.e., on thenon-complementary strand) can be 5′-N₁GG-3′, where N₁ is any DNAnucleotide, and where the PAM is immediately 3′ of the guide RNA targetsequence on the non-complementary strand of the target DNA. As such, thesequence corresponding to the PAM on the complementary strand (i.e., thereverse complement) would be 5′-CCN₂-3′, where N2 is any DNA nucleotideand is immediately 5′ of the sequence to which the DNA-targeting segmentof the guide RNA hybridizes on the complementary strand of the targetDNA. In some such cases, N₁ and N₂ can be complementary and the N₁-N₂base pair can be any base pair (e.g., N₁═C and N₂=G; N₁=G and N₂═C; N₁=Aand N₂=T; or N₁=T, and N₂=A). In the case of Cas9 from S. aureus, thePAM can be NNGRRT or NNGRR, where N can A, G, C, or T, and R can be G orA. In the case of Cas9 from C. jejuni, the PAM can be, for example,NNNNACAC or NNNNRYAC, where N can be A, G, C, or T, and R can be G or A.In some cases (e.g., for FnCpf1), the PAM sequence can be upstream ofthe 5′ end and have the sequence 5′-TTN-3′.

An example of a guide RNA target sequence is a 20-nucleotide DNAsequence immediately preceding an NGG motif recognized by an SpCas9protein. For example, two examples of guide RNA target sequences plusPAMs are GN₁₉NGG (SEQ ID NO: 9) or N₂₀NGG (SEQ ID NO: 10). See, e.g., WO2014/165825, herein incorporated by reference in its entirety for allpurposes. The guanine at the 5′ end can facilitate transcription by RNApolymerase in cells. Other examples of guide RNA target sequences plusPAMs can include two guanine nucleotides at the 5′ end (e.g., GGN₂₀NGG;SEQ ID NO: 11) to facilitate efficient transcription by T7 polymerase invitro. See, e.g., WO 2014/065596, herein incorporated by reference inits entirety for all purposes. Other guide RNA target sequences plusPAMs can have between 4-22 nucleotides in length of SEQ ID NOS: 9-11,including the 5′ G or GG and the 3′ GG or NGG. Yet other guide RNAtarget sequences plus PAMs can have between 14 and 20 nucleotides inlength of SEQ ID NOS: 9-11.

Formation of a CRISPR complex hybridized to a target DNA can result incleavage of one or both strands of the target DNA within or near theregion corresponding to the guide RNA target sequence (i.e., the guideRNA target sequence on the non-complementary strand of the target DNAand the reverse complement on the complementary strand to which theguide RNA hybridizes). For example, the cleavage site can be within theguide RNA target sequence (e.g., at a defined location relative to thePAM sequence). The “cleavage site” includes the position of a target DNAat which a Cas protein produces a single-strand break or a double-strandbreak. The cleavage site can be on only one strand (e.g., when a nickaseis used) or on both strands of a double-stranded DNA. Cleavage sites canbe at the same position on both strands (producing blunt ends; e.g.Cas9)) or can be at different sites on each strand (producing staggeredends (i.e., overhangs); e.g., Cpf1). Staggered ends can be produced, forexample, by using two Cas proteins, each of which produces asingle-strand break at a different cleavage site on a different strand,thereby producing a double-strand break. For example, a first nickasecan create a single-strand break on the first strand of double-strandedDNA (dsDNA), and a second nickase can create a single-strand break onthe second strand of dsDNA such that overhanging sequences are created.In some cases, the guide RNA target sequence or cleavage site of thenickase on the first strand is separated from the guide RNA targetsequence or cleavage site of the nickase on the second strand by atleast 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75, 100, 250,500, or 1,000 base pairs.

C. Other Nuclease Agents

Any other type of known rare-cutting nuclease agent can also be used inthe methods described herein. One example of such a nuclease agent is aTranscription Activator-Like Effector Nuclease (TALEN). TAL effectornucleases are a class of sequence-specific nucleases that can be used tomake double-strand breaks at specific target sequences in DNA. TALeffector nucleases are created by fusing a native or engineeredtranscription activator-like (TAL) effector, or functional part thereof,to the catalytic domain of an endonuclease, such as, for example, FokI.The unique, modular TAL effector DNA binding domain allows for thedesign of proteins with potentially any given DNA recognitionspecificity. Thus, the DNA binding domains of the TAL effector nucleasescan be engineered to recognize specific DNA target sites and thus, usedto make double-strand breaks at desired target sequences. See WO2010/079430; Morbitzer et al. (2010) Proc. Natl. Acad. Sci. U.S.A.107(50):21617-21622; Scholze & Boch (2010) Virulence 1:428-432;Christian et al. Genetics (2010) 186:757-761; Li et al. (2010) NucleicAcids Res. (2010) 39(1):359-372; and Miller et al. (2011) Nat.Biotechnol. 29:143-148, each of which is herein incorporated byreference in its entirety for all purposes.

Examples of suitable TAL nucleases, and methods for preparing suitableTAL nucleases, are disclosed, e.g., in US 2011/0239315, US 2011/0269234,US 2011/0145940, US 2003/0232410, US 2005/0208489, US 2005/0026157, US2005/0064474, US 2006/0188987, and US 2006/0063231, each of which isherein incorporated by reference in its entirety for all purposes.

In some TALENs, each monomer of the TALEN comprises 33-35 TAL repeatsthat recognize a single base pair via two hypervariable residues. TheTALEN can be a chimeric protein comprising a TAL-repeat-based DNAbinding domain operably linked to an independent nuclease such as a FokIendonuclease. For example, the nuclease agent can comprise a firstTAL-repeat-based DNA binding domain and a second TAL-repeat-based DNAbinding domain, wherein each of the first and the secondTAL-repeat-based DNA binding domains is operably linked to a FokInuclease, wherein the first and the second TAL-repeat-based DNA bindingdomain recognize two contiguous target DNA sequences in each strand ofthe target DNA sequence separated by a spacer sequence of varying length(12-20 bp), and wherein the FokI nuclease subunits dimerize to create anactive nuclease that makes a double strand break at a target sequence.

Another example of a suitable nuclease agent is a zinc-finger nuclease(ZFN). In some ZFNs, each monomer of the ZFN comprises 3 or more zincfinger-based DNA binding domains, wherein each zinc finger-based DNAbinding domain binds to a 3 bp subsite. In other ZFNs, the ZFN is achimeric protein comprising a zinc finger-based DNA binding domainoperably linked to an independent nuclease such as a FokI endonuclease.For example, the nuclease agent can comprise a first ZFN and a secondZFN, wherein each of the first ZFN and the second ZFN is operably linkedto a FokI nuclease subunit, wherein the first and the second ZFNrecognize two contiguous target DNA sequences in each strand of thetarget DNA sequence separated by about 5-7 bp spacer, and wherein theFokI nuclease subunits dimerize to create an active nuclease that makesa double strand break. See, e.g., US 2006/0246567; US 2008/0182332; US2002/0081614; US 2003/0021776; WO 2002/057308; US 2013/0123484; US2010/0291048; WO 2011/017293; and Gaj et al. (2013) Trends Biotechnol.,31(7):397-405, each of which is herein incorporated by reference in itsentirety for all purposes.

Another type of suitable nuclease agent is an engineered meganuclease.Meganucleases have been classified into four families based on conservedsequence motifs, the families are the LAGLIDADG, GIY-YIG, H—N—H, andHis-Cys box families. These motifs participate in the coordination ofmetal ions and hydrolysis of phosphodiester bonds. Meganucleases arenotable for their long target sequences, and for tolerating somesequence polymorphisms in their DNA substrates. Meganuclease domains,structure and function are known, see for example, Guhan and Muniyappa(2003) Crit. Rev. Biochem. Mol. Biol. 38:199-248; Lucas et al. (2001)Nucleic Acids Res. 29:960-9; Jurica and Stoddard, (1999) Cell. Mol. LifeSci. 55:1304-26; Stoddard (2006) Q. Rev. Biophys. 38:49-95; and Moure etal. (2002) Nat. Struct. Biol. 9:764, each of which is hereinincorporated by reference in its entirety for all purposes. In someexamples, a naturally occurring variant and/or engineered derivativemeganuclease is used. Methods for modifying the kinetics, cofactorinteractions, expression, optimal conditions, and/or target sequencespecificity, and screening for activity are known. See, e.g., Epinat etal. (2003) Nucleic Acids Res. 31:2952-62; Chevalier et al. (2002) Mol.Cell 10:895-905; Gimble et al. (2003) Mol. Biol. 334:993-1008; Seligmanet al. (2002) Nucleic Acids Res. 30:3870-9; Sussman et al. (2004) J.Mol. Biol. 342:31-41; Rosen et al. (2006) Nucleic Acids Res.34:4791-800; Chames et al. (2005) Nucleic Acids Res. 33:e178; Smith etal. (2006) Nucleic Acids Res. 34:e149; Gruen et al. (2002) Nucleic AcidsRes. 30:e29; Chen and Zhao (2005) Nucleic Acids Res. 33:e154; WO2005/105989; WO 2003/078619; WO 2006/097854; WO 2006/097853; WO2006/097784; and WO 2004/031346, each of which is herein incorporated byreference in its entirety for all purposes.

Any meganuclease can be used, including, for example, I-SceI, I-SceII,I-SceIII, I-SceIV, I-SceV, I-SceVI, I-SceVII, I-CeuI, I-CeuAIIP, I-CreI,I-CrepsbIP, I-CrepsbIIP, I-CrepsbIIIP, I-CrepsbIVP, I-TliI, I-PpoI,PI-PspI, F-SceI, F-SceII, F-SuvI, F-TevI, F-TevII, I-AmaI, I-AniI,I-ChuI, I-CmoeI, I-CpaI, I-CpaII, I-CsmI, I-CvuI, I-CvuAIP, I-DdiI,I-DdiII, I-DirI, I-DmoI, I-HmuI, I-HmuII, I-HsNIP, I-LlaI, I-MsoI,I-NaaI, I-NanI, I-NcIIP, I-NgrIP, I-NitI, I-NjaI, I-Nsp236IP, I-PakI,I-PboIP, I-PcuIP, I-PcuAI, I-PcuVI, I-PgrIP, I-PobIP, I-PorI, I-PorIIP,I-PbpIP, I-SpBetaIP, I-ScaI, I-SexIP, I-SneIP, I-SpomI, I-SpomCP,I-SpomIP, I-SpomIIP, I-SquIP, I-Ssp6803I, I-SthPhiJP, I-SthPhiST3P,I-SthPhiSTe3bP, I-TdeIP, I-TevI, I-TevII, I-TevIII, I-UarAP,I-UarHGPAIP, I-UarHGPA13P, I-VinIP, I-ZbiIP, PI-MtuI, PI-MtuHIPPI-MtuHIIP, PI-PfuI, PI-PfuII, PI-PkoI, PI-PkoII, PI-Rma43812IP,PI-SpBetaIP, PI-SceI, PI-TfuI, PI-TfuII, PI-ThyI, PI-TliI, PI-TliII, orany active variants or fragments thereof.

Meganucleases can recognize, for example, double-stranded DNA sequencesof 12 to 40 base pairs. In some cases, the meganuclease recognizes oneperfectly matched target sequence in the genome. Some meganucleases arehoming nucleases. One type of homing nuclease is a LAGLIDADG family ofhoming nucleases including, for example, I-SceI, I-CreI, and I-DmoI.

VIII. Selection Cassettes

Any suitable selection cassette can be used in the methods describedherein. The term selection cassette refers to an expression cassettethat comprises one or more expression control sequences (e.g. a promoterfor expression in a bacterial cell and/or other regulatory sequencessuch as enhancers, post-transcriptional regulatory elements, and poly(A)sequences) operably linked to a nucleic acid encoding a selectablemarker. Selection cassettes can allow for selection in bacterial cells,or they can allow for selection in both bacterial and eukaryotic ormammalian cells. As one example, a gene such as neomycinphosphotransferase can be used. Neomycin phosphotransferase conferskanamycin resistance in prokaryotic cells and G418 resistance ineukaryotic cells. Such a gene can be used, for example, in combinationwith a dual promoter system combining a eukaryotic promoter (e.g., aeukaryotic phosphoglycerate kinase (PGK) promoter) and a prokaryoticpromoter (e.g., a prokaryotic EM7 promoter).

Some selection cassettes that can be used in the methods describedherein can impart resistance to an antibiotic that would otherwise killor inhibit the growth of the bacterial cells. For example, a selectioncassette can impart resistance to kanamycin, spectinomyin, streptomycin,ampicillin, carbenicillin, bleomycin, erythromycin, polymxin B,tetracycline, or chloramphenicol. Such selection cassettes and genesthat impart resistance to these antibiotics and others are well-known.Cells comprising the selection cassettes can be selected by treating thecells with the antibiotic. Those cells that are resistant to theantibiotic comprise the selection cassette.

Other selection cassettes can comprise reporter genes that can be usedto select for cells comprising an intended modification. The termreporter gene refers to a nucleic acid having a sequence encoding a geneproduct (typically an enzyme) that is easily and quantifiably assayedwhen a construct comprising the reporter gene sequence operably linkedto a heterologous promoter and/or enhancer element is introduced intocells containing (or which can be made to contain) the factors necessaryfor the activation of the promoter and/or enhancer elements. Examples ofreporter genes include, but are not limited, to genes encodingfluorescent proteins. A reporter protein refers to a protein encoded bya reporter gene.

A fluorescent reporter protein is a reporter protein that is detectablebased on fluorescence wherein the fluorescence may be either from thereporter protein directly, activity of the reporter protein on afluorogenic substrate, or a protein with affinity for binding to afluorescent tagged compound. Examples of fluorescent proteins includegreen fluorescent proteins (e.g., GFP, GFP-2, tagGFP, turboGFP, eGFP,Emerald, Azami Green, Monomeric Azami Green, CopGFP, AceGFP, andZsGreenl), yellow fluorescent proteins (e.g., YFP, eYFP, Citrine, Venus,YPet, PhiYFP, and ZsYellowl), blue fluorescent proteins (e.g., BFP,eBFP, eBFP2, Azurite, mKalamal, GFPuv, Sapphire, and T-sapphire), cyanfluorescent proteins (e.g., CFP, eCFP, Cerulean, CyPet, AmCyanl, andMidoriishi-Cyan), red fluorescent proteins (e.g., RFP, mKate, mKate2,mPlum, DsRed monomer, mCherry, mRFP1, DsRed-Express, DsRed2,DsRed-Monomer, HcRed-Tandem, HcRedl, AsRed2, eqFP611, mRaspberry,mStrawberry, and Jred), orange fluorescent proteins (e.g., mOrange, mKO,Kusabira-Orange, Monomeric Kusabira-Orange, mTangerine, and tdTomato),and any other suitable fluorescent protein whose presence in cells canbe detected by flow cytometry methods. Cells comprising a fluorescentreporter gene can be selected, for example, by sorting for cellscomprising the fluorescent reporter protein encoded by the gene.

IX. Targeted Modifications

Various types of targeted genetic modifications can be introduced usingthe methods described herein. Such targeted genetic modifications caninclude, for example, insertion of one or more nucleotides, deletion ofone or more nucleotides, or substitution (replacement) of one or morenucleotides. Such insertions, deletions, or replacements can result, forexample, in a point mutation, a knockout of a nucleic acid sequence ofinterest or a portion thereof, a knock-in of a nucleic acid sequence ofinterest or a portion thereof, a replacement of an endogenous nucleicacid sequence with a heterologous or exogenous nucleic acid sequence, areplacement of an endogenous nucleic acid sequence with a homologous ororthologous nucleic acid sequence (e.g., domain swap, exon swap, intronswap, regulatory sequence swap, or gene swap), alteration of aregulatory element (e.g., promoter or enhancer), a missense mutation, anonsense mutation, a frame-shift mutation, a truncation mutation, a nullmutation, or a combination thereof. For example, at least 1, 2, 3, 4, 5,7, 8, 9, 10 or more nucleotides can be changed (e.g., deleted, inserted,or substituted) to form the targeted genetic modification. Thedeletions, insertions, or replacements can be of any size, as disclosedelsewhere herein. See, e.g., Wang et al. (2013) Cell 153:910-918;Mandalos et al. (2012) PLOS One 7:e45768; and Wang et al. (2013) NatBiotechnol. 31:530-532, each of which is herein incorporated byreference in its entirety.

Deletions, insertions, or replacements can be any length. The deleted,inserted, or replaced nucleic acid can be, for example, from about 1 bpto about 5 bp, from about 5 bp to about 10 bp, from about 10 bp to about50 bp, from about 50 bp to about 100 bp, from about 100 bp to about 200bp, from about 200 bp to about 300 bp, from about 300 bp to about 400bp, from about 400 bp to about 500 bp, from about 500 bp to about 1 kb,from about 1 kb to about 5 kb, from about 5 kb to about 10 kb, fromabout 10 kb to about 20 kb, from about 20 kb to about 40 kb, from about40 kb to about 60 kb, from about 60 kb to about 80 kb, from about 80 kbto about 100 kb, from about 100 kb to about 150 kb, or from about 150 kbto about 200 kb, from about 200 kb to about 300 kb, from about 300 kb toabout 400 kb, or from about 400 kb to about 500 kb.

All patent filings, websites, other publications, accession numbers andthe like cited above or below are incorporated by reference in theirentirety for all purposes to the same extent as if each individual itemwere specifically and individually indicated to be so incorporated byreference. If different versions of a sequence are associated with anaccession number at different times, the version associated with theaccession number at the effective filing date of this application ismeant. The effective filing date means the earlier of the actual filingdate or filing date of a priority application referring to the accessionnumber if applicable. Likewise, if different versions of a publication,website or the like are published at different times, the version mostrecently published at the effective filing date of the application ismeant unless otherwise indicated. Any feature, step, element,embodiment, or aspect of the invention can be used in combination withany other unless specifically indicated otherwise. Although the presentinvention has been described in some detail by way of illustration andexample for purposes of clarity and understanding, it will be apparentthat certain changes and modifications may be practiced within the scopeof the appended claims.

BRIEF DESCRIPTION OF THE SEQUENCES

The nucleotide and amino acid sequences listed in the accompanyingsequence listing are shown using standard letter abbreviations fornucleotide bases, and three-letter code for amino acids. The nucleotidesequences follow the standard convention of beginning at the 5′ end ofthe sequence and proceeding forward (i.e., from left to right in eachline) to the 3′ end. Only one strand of each nucleotide sequence isshown, but the complementary strand is understood to be included by anyreference to the displayed strand. When a nucleotide sequence encodingan amino acid sequence is provided, it is understood that codondegenerate variants thereof that encode the same amino acid sequence arealso provided. The amino acid sequences follow the standard conventionof beginning at the amino terminus of the sequence and proceedingforward (i.e., from left to right in each line) to the carboxy terminus.

TABLE 2 Description of Sequences. SEQ ID NO Type Description 1 ProteinCas9 Protein 2 DNA Cas9 DNA 3 RNA crRNA tail 4 RNA tracrRNA 5 RNA gRNAscaffold v1 6 RNA gRNA scaffold v2 7 RNA gRNA scaffold v3 8 RNA gRNAscaffold v4 9 DNA guide RNA target sequence plus PAM v1 10 DNA guide RNAtarget sequence plus PAM v2 11 DNA guide RNA target sequence plus PAM v312 RNA tracrRNA v2 13 RNA tracrRNA v3 14 RNA gRNA scaffold v5 15 RNAgRNA scaffold v6 16 RNA gRNA scaffold v7

EXAMPLES Example 1. Scarless Introduction of a Targeted Modificationinto a Large Targeting Vector Via Bacterial Homologous Recombination andIntramolecular Gibson Assembly

Gibson assembly technology joins segments of DNA with homologous endsinto a single molecule. It differs from traditional ligations betweencomplementary, staggered ends created by restriction enzymes in that anycomplementary sequence of a minimal size can be used. As cloning viarestriction sites generally results in incorporation of exogenous DNAscars (enzyme recognition sites) into the final product, Gibson assemblyis advantageous because it can be seamless.

The Gibson assembly reaction is isothermal and involves three differentenzymes: T5 exonuclease, DNA polymerase, and ligase. See, e.g., US2010/0035768, US 2015/0376628, WO 2015/200334, and Gibson et al. (2009)Nat. Methods 6(5):343-345, each of which is herein incorporated byreference in its entirety for al purposes. The reaction begins with thegeneration of single-stranded DNA ends due to 5′ to 3′ exonucleaseactivity by the T5 exonuclease. DNA fragments with complementarysingle-stranded ends then align by simple base-pairing rules, and DNApolymerase fills gaps moving 5′ to 3′. DNA ligase seals the final nickand a seamless, double-stranded DNA molecule is the result.Complementary ends of forty base pairs have been shown to be effective,and the actual sequence is generally not important. The startingfragments can be generated by PCR, restriction, or direct synthesis.

Seamless DNA construction is of particular importance when creatingtransgenic animal lines, as the scars produced by restriction sites orother manipulations can negatively impact gene expression if they landin a region important for regulation. Targeting the mammalian genomeoften requires construction of large targeting vectors with long DNAarms to direct homologous recombination, as well as antibioticresistance cassettes for selection of embryonic stem cell clones.Correctly targeted clones often contain multiple scars necessary forconstruction of the vector, not to mention the resistance cassetteitself. For genetic ablation, these lesions may not matter for the endresult (a null allele), but there is always the chance that expressionby neighboring genes will be adversely affected. For modifications otherthan knock-out, such as knock-in (e.g., reporters or mutant alleles),faithful expression of the targeted locus is usually important for thestudies in question. Gibson assembly can abrogate the need for some ofthese scars and even facilitate construction of the vector itself insome cases, but unique restriction sites can be difficult to find.

Humanization, the direct replacement of a mouse gene with its humancounterpart, in particular requires seamless junctions between mouse andhuman sequence so that mouse transcription machinery will faithfullyreplicate expression of the new allele. Care must be taken to buryconstruction scars and selection cassette in noncoding regions that donot impact gene regulation. As animal models become more complex, moremodifications may be added on top of existing ones, such as humandisease-causing mutations on humanized alleles. The additional changescan then add even more scars and another selection cassette to analready highly engineered mouse locus, increasing the likelihood thatexpression will be altered and the mouse model will not be faithful tohuman disease. From a construction standpoint, adding a new cassette toa vector already containing one can become complicated due to undesiredrecombination between shared cassette elements such as promoters andpoly(A) signals, even if the two cassettes encode different selections.

In view of these hurdles, we have developed methods to simplifygenerating targeting vectors carrying multiple changes such as ahumanized allele and a disease mutation layered on top. These methodsenable easier construction and minimize scars incorporated into thefinal animal model.

In a first method, a small piece of DNA carrying a desired mutation issynthesized flanked by short (<500 bp) homology arms. A few base pairsdownstream of the desired mutation, a 40-50 base pair region is selectedand duplicated to create direct repeats to flank rare restriction sitesor Cas9 guide RNA target sequences flanking a resistance cassette. Thissmall construct is then homologously recombined with an establishedmouse targeting vector (such as a humanization targeting vector, withits own resistance cassette) by recombineering technology. Afterconfirmation that the desired mutation is incorporated, the new vectoris cut with the rare cutter/Cas9 guide, dropping out the cassette andexposing the 40-50 base pair direct repeats. Gibson assembly then sealsthe break seamlessly in an intramolecular reaction. The resultingtargeting vector now carries the desired mutation and no additionalscars or cassette besides the ones originally present in thehumanization.

In a specific example, we generated an allele incorporating a pointmutation into a targeting construct (a large targeting vector)comprising a humanized Target Gene 1. See FIG. 1. The goal was toefficiently and seamlessly create a large targeting vector comprising ahumanized Target Gene 1 with the point mutation instead of having toretarget humanized mouse embryonic stem (ES) cells comprising thehumanized Target Gene 1 in order to introduce the point mutation inthose humanized mouse ES cells. The initial targeting constructcontained the human Target Gene 1 genomic sequence from the start codonto the stop codon, including all introns, to replace the mouse genomicsequence of the corresponding mouse Target Gene 1 from the start codonto the stop codon. In addition, the insert nucleic acid in the targetingconstruct comprised a self-deleting hygromycin resistance cassettedownstream of the poly(A) sequence. This starting humanization vectorwas then modified as described above with the point mutation and aneomycin resistance cassette, flanked by AscI restriction sites and 50base pair direct repeats of human Target Gene 1 sequence just downstreamof the point mutation. A nucleic acid was then synthesized to comprisean EM7 neomycin cassette flanked by rare restriction sites (AscI) and a50 bp direct repeat from the exon into which the mutation was to beintroduced as well as upstream and downstream homology boxes, includingthe mutation to be introduced in the upstream homology box. See FIG. 2.The neomycin resistant cassette was inserted into the middle of the exonto be mutated, but because the method is seamless, the exon wasrecapitulated at the end of the method. The nucleic acid was linearizedby cleavage with HindIII, and bacterial homologous recombination wasused to insert the linearized synthetic nucleic acid into the largetargeting vector comprising the humanized Target Gene 1. See, e.g., US2004/0018626 and Valenzuela et al. (2003) Nat. Biotechnol.21(6):652-659, each of which is herein incorporated by reference in itsentirety. The neomycin cassette was excised with AscI, which dropped outthe neomycin cassette and exposed the direct repeats. The construct wasthen resealed by intramolecular Gibson assembly, which resolved thedirect repeats to a single copy, leaving the exon (now comprising themutation) intact with no scars. Following Gibson assembly, the reactionwas again digested with AscI, in order to cut anything that did notdelete the AscI sites during Gibson assembly, thereby reducingbackground. Final sequencing confirmed the presence of the pointmutation and no additional changes from the original targeting vector.The newly modified vector was electroporated into mouse embryonic stemcells, and positive clones were identified by TAQMAN followed by Sangersequencing to confirm the incorporation of the point mutation.

Example 2. Scarless Introduction of a Targeted Modification into a LargeTargeting Vector Via Bacterial Homologous Recombination andIntermolecular Gibson Assembly

In a second method, a desired mutation is introduced into a bacterialartificial chromosome (BAC) DNA in two general steps. In the first step,the region of interest in the BAC (a region spanning about 100-200 bp oneach side of the mutation) is deleted by bacterial homologousrecombination using a selection cassette flanked by a rare cutterrestriction enzyme site on each side. In the second step, the BACdeletion s replaced with the desired mutated sequence by Gibson assemblyusing a DNA fragment of about 200-500 bp having heterologous 5′ and 3′ends homologous to the targeted BAC sequence adjacent to the rare cuttersite. For this purpose, the targeted BAC in the first step is digestedwith the rare cutter enzyme, exposing the two ends homologous to themutated fragment. The restriction enzyme also keeps the targeted BACopen, allowing for a low background reaction without the need to add aselection marker. See FIG. 3. This method is particularly beneficial,for example, when larger fragments that cannot be obtained by PCR (e.g.,15 kb or 30 kb) need to be inserted in a construct. For example, such alarge fragment can be cut from its source like a BAC (e.g., usingCRISPR/Cas9), and then inserted using Gibson assembly a modified BACthat carries homology to the 5′ and 3′ ends of this fragment, therebycreating the final targeting construct.

In a specific example, this method was used to introduce a splicemutation into a large targeting vector comprising a humanized TargetGene 2. The initial targeting construct contained the wild type TargetGene 2 genomic sequence, including introns, designed to replace thecorresponding genomic sequence of the mouse Target Gene 2 from the startcodon to before last exon, including adding a self-deleting neomycinresistance cassette in an intron. This starting humanization vector wasthen modified as described above with a hygromycin resistance cassette,flanked by AscI restriction sites and forty base pairs homologoussequence to downstream and upstream of the desired splice mutation. Thehygromycin cassette was excised with AscI, and the construct wasresealed by intermolecular Gibson assembly with a DNA fragmentcomprising the splice mutation flanked by heterologous 5′ and 3′ endshomologous to the targeted targeting construct sequence adjacent to therare cutter site. Final sequencing confirmed the presence of the splicemutation and no additional changes from the original targeting vector.The newly modified vector was electroporated into mouse embryonic stemcells and a positive clone was identified by TAQMAN, followed by Sangersequencing to confirm the incorporation of the splice mutation.

In a third method, a human DNA fragment from a bacterial artificialchromosome (BAC) is cut out using CRISPR/Cas9. This human DNA fragmentis fused, by Gibson assembly, to a mouse BAC that was previouslytargeted with a selection cassette. A rare cutter restriction enzymesite is designed in region where the human fragment was to beintegrated. In the targeted mouse BAC, there are 40 bp of homologysequences on each side of this rare cutter restriction site. Thehomology sequences are homologous to the 5′ and 3′ ends of the human DNAfragment. The final construct is selected in the same antibiotic as theoriginal mouse BAC targeted. Even though no new selection isincorporated in the final Gibson assembly reaction, low background isobserved. Addition of the rare restriction enzyme following Gibsonassembly keeps the background at low level.

In a specific example, the above experiment was to incorporate an allelecomprising the region of Target Gene 3 encoding the ectodomain of TargetProtein 3 into the mouse Target Gene 3. The initial targeting constructcontained the wild type mouse Target Gene 3 genomic sequence, includingintrons. A self-deleting neomycin resistance cassette was added bybacterial homologous recombination, deleting the mouse Target Gene 3ectodomain-encoding region. Upstream the neomycin resistance cassette,there was a SgrDI restriction site that separates the 5′ and 3′ 40 bpregions of human homology that will interact with the human fragment.All of these sequences were incorporated by bacterial homologousrecombination previously described. A human DNA fragment 32 kb in lengthwas excised from a human BAC by CRISPR/Cas9, leaving the 5′ and 3′ endsexposed for the intramolecular Gibson assembly reaction with the mousetargeted BAC that was opened by SgrDI digestion. The newly modifiedvector was electroporated into mouse embryonic stem cells, and apositive clone was identified by TAQMAN.

We claim:
 1. A method for introducing a scarless targeted geneticmodification in a preexisting targeting vector, comprising: (a)performing bacterial homologous recombination between the preexistingtargeting vector and a modification cassette in a population ofbacterial cells, wherein the modification cassette comprises thetargeted genetic modification and comprises an insert nucleic acidflanked by a 5′ homology arm corresponding to a 5′ target sequence inthe preexisting targeting vector and a 3′ homology arm corresponding toa 3′ target sequence in the preexisting targeting vector, wherein theinsert nucleic acid comprises from 5′ to 3′: (i) a first repeatsequence; (ii) a first target site for a first nuclease agent; (iii) aselection cassette; (iv) a second target site for a second nucleaseagent; and (v) a second repeat sequence identical to the first repeatsequence, wherein the repeat sequence is at least about 20 nucleotidesin length; (b) selecting bacterial cells comprising a modified targetingvector comprising the selection cassette; (c) cleaving the first targetsite in the modified targeting vector with the first nuclease agent andcleaving the second target site in the modified targeting vector withthe second nuclease agent to remove the selection cassette and exposethe first repeat sequence and the second repeat sequence in the modifiedtargeting vector, wherein step (c) occurs in vitro; and (d) assemblingthe exposed first repeat sequence with the exposed second repeatsequence in an intramolecular in vitro assembly reaction to generate thetargeting vector comprising the scarless targeted genetic modification,wherein neither the first target site for the first nuclease agent northe second target site for the second nuclease agent are present andonly a single copy of the repeat sequence is present in the targetingvector comprising the scarless targeted genetic modification.
 2. Themethod of claim 1, wherein the repeat sequence is identical to asequence in the preexisting targeting vector.
 3. The method of claim 1,wherein the targeted genetic modification comprises an insertion, andthe repeat sequence is identical to the 5′ end or the 3′ end of theinsertion.
 4. The method of claim 1, wherein the repeat sequence isbetween about 20 nucleotides and about 100 nucleotides in length.
 5. Themethod of claim 1, wherein the modification cassette is a linear,double-stranded nucleic acid.
 6. The method of claim 1, wherein themodification cassette is from about 1 kb to about 15 kb in length. 7.The method of claim 1, wherein the 5′ homology arm and the 3′ homologyarm are each at least about 35 nucleotides in length.
 8. The method ofclaim 7, wherein the 5′ homology arm and the 3′ homology arm are eachbetween about 35 nucleotides and about 500 nucleotides in length.
 9. Themethod of claim 1, wherein the first nuclease agent and/or the secondnuclease agent is a rare-cutting nuclease agent.
 10. The method of claim1, wherein the first target site and/or the second target site is notpresent in the preexisting targeting vector.
 11. The method of claim 1,wherein the first target site is identical to the second target site,and the first nuclease agent is identical to the second nuclease agent.12. The method of claim 1, wherein the first nuclease agent and/or thesecond nuclease agent comprises a rare-cutting restriction enzyme. 13.The method of claim 12, wherein the rare-cutting restriction enzyme isNotI, XmaIII, SstII, Sall, NruI, NheI, Nb.BbvCI, BbvCI, AscI, AsiSI,FseI, PacI, PmeI, SbfI, SgrAI, SwaI, BspQI, SapI, SfiI, CspCI, AbsI,CciNI, FspAI, MauBI, MreI, MssI, PalAI, RgaI, RigI, SdaI, SfaAI, SgfI,SgrDI, SgsI, SmiI, SrfI, Sse2321, Sse83871, LguI, PciSI, AarI, AjuI,AloI, BarI, PpiI, or PsrI.
 14. The method of claim 1, wherein the firstnuclease agent and/or the second nuclease agent is a Clustered RegularlyInterspaced Short Palindromic Repeats (CRISPR)-associated (Cas) proteinand a guide RNA (gRNA), a zinc finger nuclease (ZFN), a TranscriptionActivator-Like Effector Nuclease (TALEN), or an engineered meganuclease.15. The method of claim 14, wherein the first nuclease agent and/or thesecond nuclease agent is the Cas protein and the gRNA, wherein the Casprotein is Cas9, and wherein the gRNA comprises a CRLSPR RNA (crRNA) anda trans-activating CRISPR RNA (tracrRNA).
 16. The method of claim 1,wherein the targeted genetic modification comprises a modification inthe 5′ homology arm or the 3′ homology arm.
 17. The method of claim 1,wherein the targeted genetic modification comprises a modification inthe insert nucleic acid.
 18. The method of claim 1, wherein the targetedgenetic modification comprises a point mutation, a deletion, aninsertion, a replacement, or a combination thereof.
 19. The method ofclaim 1, wherein the selection cassette imparts resistance to anantibiotic.
 20. The method of claim 19, wherein the selection cassetteimparts resistance to ampicillin, chloramphenicol, tetracycline,kanamycin, spectinomycin, streptomycin, carbenicillin, bleomycin,erythromycin, or polymyxin B.
 21. The method of claim 1, wherein thepreexisting targeting vector is a large targeting vector at least about10 kb in length.
 22. The method of claim 21, wherein the preexistingtargeting vector is at least about 100 kb in length.
 23. The method ofclaim 1, wherein the preexisting targeting vector comprises a secondselection cassette.
 24. The method of claim 23, wherein the secondselection cassette imparts resistance to an antibiotic.
 25. The methodof claim 24, wherein the selection cassette in the modification cassetteand the second selection cassette in the preexisting targeting vectoreach impart resistance to a different antibiotic.
 26. The method ofclaim 23, wherein the second selection cassette allows for selection inboth bacterial and mammalian cells.
 27. The method of claim 1, whereinstep (d) comprises: (i) contacting the modified targeting vector with anexonuclease to expose complementary sequences between the first repeatsequence and the second repeat sequence; (ii) annealing the exposedcomplementary sequences; (iii) extending the 3′ ends of the annealedcomplementary sequences; and (iv) ligating the annealed complementarysequences.
 28. The method of claim 27, wherein step (d) comprisesincubating the modified targeting vector with an exonuclease, a DNApolymerase, and a DNA ligase.
 29. The method of claim 1, furthercomprising: (e) treating the targeting vector with the first nucleaseagent and the second nuclease agent following the in vitro assembly instep (d) to verify that neither the first target site for the firstnuclease agent nor the second target site for the second nuclease agentare present.